![]() |
|
|
#133 | |
|
Dec 2011
New York, U.S.A.
9710 Posts |
Quote:
I don't understand why your fix would work. Is the only change you made to add an extra DeviceToHost copy of maxErr? But, I did see that the init_device() routine makes 4 cuda device calls, and unlike the rest of the software, it does not use cutilSafeCall() to check for errors. In particular, I think this call may be important: Code:
cudaSetDeviceFlags(cudaDeviceBlockingSync); However, I do not understand why that call would be failing. I would suggest adding cutilSafeCall() to the API calls inside init_device() function. Seems like a good thing to do regardless of whether this is related to the 100% CPU core problem. I'm adding that to my version of the code. Mike |
|
|
|
|
|
|
#134 | |||
|
Jul 2009
Tokyo
26216 Posts |
Hi ,AG5BPilot
Quote:
I think this code will trigger. Quote:
Quote:
Anyway strange phenomenon. |
|||
|
|
|
|
|
#135 |
|
Jul 2009
Tokyo
26216 Posts |
Hi ,
Ver 1.06,1.061,1.062 have fatal error with 32bit OS. I made bug around GMP code. Do not use with 32bit OS. Thank you, |
|
|
|
|
|
#136 |
|
Dec 2011
2·7 Posts |
|
|
|
|
|
|
#137 |
|
Jul 2009
Tokyo
2·5·61 Posts |
Hi ,
llrCUDA was OK. Code:
m_b = b;
mpz_init_set_ui(m_Na,m_b);
for(j = m; j != 1; j/=2)
mpz_mul(m_Na,m_Na,m_Na);
m_Na_size = mpz_size(m_Na);
m_Na_size_byte = m_Na_size*sizeof(long int);
m_a = (long int*) malloc(m_Na_size*sizeof(long int));
mpz_export(m_a,NULL,0,m_Na_size_byte,0,0,m_Na);
Code:
if(m_a_32[m_a_32_len-1]== 0) m_a_32_len--; |
|
|
|
|
|
#138 |
|
Jul 2009
Tokyo
61010 Posts |
Hi ,
mini test program. Code:
#include <stdio.h>
#include <gmp.h>
int main()
{
mpz_t m;
int i,j;
unsigned long long e_m[10];
mpz_init_set_ui(m,2);
for(j = 0; j != 8; j++)
{
mpz_mul(m,m,m);
for(i=0;i<10;i++)e_m[i]=0;
mpz_export(e_m,NULL,0,80,0,0,m);
printf(" mpz_size(m)=%d ",(int) mpz_size(m));
for(i=9;i>=0;i--) printf(" %llx",e_m[i]);
printf("\n");
}
mpz_clear(m);
}
Code:
$ ./a.out mpz_size(m)=1 0 0 0 0 0 0 0 0 0 4 mpz_size(m)=1 0 0 0 0 0 0 0 0 0 10 mpz_size(m)=1 0 0 0 0 0 0 0 0 0 100 mpz_size(m)=1 0 0 0 0 0 0 0 0 0 10000 mpz_size(m)=2 0 0 0 0 0 0 0 0 0 100000000 mpz_size(m)=3 0 0 0 0 0 0 0 0 1 0 mpz_size(m)=5 0 0 0 0 0 0 0 1 0 0 mpz_size(m)=9 0 0 0 0 0 1 0 0 0 0 Code:
$ ./a.out mpz_size(m)=1 0 0 0 0 0 0 0 0 0 4 mpz_size(m)=1 0 0 0 0 0 0 0 0 0 10 mpz_size(m)=1 0 0 0 0 0 0 0 0 0 100 mpz_size(m)=1 0 0 0 0 0 0 0 0 0 10000 mpz_size(m)=1 0 0 0 0 0 0 0 0 0 100000000 mpz_size(m)=2 0 0 0 0 0 0 0 0 1 0 mpz_size(m)=3 0 0 0 0 0 0 0 1 0 0 mpz_size(m)=5 0 0 0 0 0 1 0 0 0 0 |
|
|
|
|
|
#139 | |
|
Dec 2011
2·7 Posts |
Hmm, got it not compiled:
Quote:
|
|
|
|
|
|
|
#140 |
|
"Mark"
Apr 2003
Between here and the
143248 Posts |
|
|
|
|
|
|
#141 |
|
Jul 2009
Tokyo
11428 Posts |
Code:
mpz_mul(m_Na,m_Na,m_Na); m_Na_size = (mpz_sizeinbase(m_Na,2)+sizeof(long long int)*8-1)/(sizeof(long long int)*8); m_Na_size_byte = m_Na_size*sizeof(long long int); |
|
|
|
|
|
#142 | |||
|
Dec 2011
2×7 Posts |
![]() Lubuntu11.10-32 Quote:
Quote:
Quote:
Last fiddled with by rroonnaalldd on 2012-01-28 at 10:17 Reason: a64-output added |
|||
|
|
|
|
|
#143 |
|
Jun 2003
508510 Posts |
does anyone have any idea of the relative time spent of "fft square" routine vs "next step" kernels in genefer?
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Genefer's FFT applied to Mersenne squaring | preda | Software | 0 | 2017-09-06 02:54 |
| CUDA 5.5 | ET_ | GPU Computing | 2 | 2013-06-13 15:50 |
| AVX CPU LL vs CUDA LL | nucleon | GPU Computing | 11 | 2012-01-04 17:52 |
| Best CUDA GPU for the $$ | Christenson | GPU Computing | 24 | 2011-05-01 00:06 |
| CUDA? | Xentar | Conjectures 'R Us | 6 | 2010-03-31 07:43 |