20100110, 20:38  #45  
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
5729_{10} Posts 
Quote:
Quote:
Quote:


20100110, 21:35  #46 
"Oliver"
Mar 2005
Germany
3^{3}·41 Posts 
Hi Henry,
try this one. I won't be surprised if it doesn't work (libraries versions, ...) I have mistyped the model name of the 8600, I have a 8600GT here, not a 8600GTS. The GTS is faster. Oliver 
20100111, 09:30  #47 
Jul 2009
Tokyo
2×5×61 Posts 
Hi,
On ubuntu9.04/32bit/GTX260 Code:
$ time ./mfaktc.exe 66362159 64 65 mfaktc v0.01 C... ... no factor for M66362159 from 2^64 to 2^65 bits tf(): total time spent: 273133msec real 4m33.207s user 4m30.925s sys 0m2.288s 
20100111, 10:08  #48 
"Oliver"
Mar 2005
Germany
3^{3}×41 Posts 
Hi msft,
can you post the compiletime options and your CPU (Q8400?), too? I'm pretty sure this run was CPUlimited. If you want to try to run 2 (or maybe even 3) processes at the same time (in different directories because both processes try to access results.txt). 
20100111, 10:36  #49 
Jul 2009
Tokyo
610_{10} Posts 
Yes Q8400.
Code:
#!/bin/bash x mkdir compile_bla_bla cd compile_bla_bla gcc Wall O2 c ../sieve.c o sieve.o nvcc c ../mfaktc.cu o mfaktc.o I /NVIDIA_GPU_Computing_SDK/C/common/inc/ ptxasoptions=v keep DMUL24HI mv mfaktc.ptx mfaktc.ptx.old cat mfaktc.ptx.old  sed s/mul\.hi\.u32/mul24\.hi\.u32/ > mfaktc.ptx rm f mfaktc.sm_10.cubin mfaktc.cu.cpp mfaktc.o ptxas key="xxxxxxxxxx" arch=sm_10 v "mfaktc.ptx" o "mfaktc.sm_10.cubin" fatbin key="xxxxxxxxxx" sourcename="../mfaktc.cu" usagemode="v " embeddedfatbin="mfaktc.fatbin.c" "image=profile=sm_10,file=mfaktc.sm_10.cubin" "image=profile=compute_10,file=mfaktc.ptx" cudafe++ gnu_version=40302 diag_error=host_device_limited_call diag_error=ms_asm_decl_not_allowed parse_templates gen_c_file_name "mfaktc.cudafe1.cpp" stub_file_name "mfaktc.cudafe1.stub.c" stub_header_file_name "mfaktc.cudafe1.stub.h" "mfaktc.cpp1.ii" gcc D__CUDA_ARCH__=100 E x c++ DCUDA_NO_SM_12_ATOMIC_INTRINSICS DCUDA_NO_SM_13_DOUBLE_INTRINSICS DCUDA_FLOAT_MATH_FUNCTIONS DCUDA_NO_SM_11_ATOMIC_INTRINSICS "I /NVIDIA_GPU_Computing_SDK/C/common/inc/" I/usr/local/cuda/include/ I. o "mfaktc.cu.cpp" "mfaktc.cudafe1.cpp" gcc c x c++ "I /NVIDIA_GPU_Computing_SDK/C/common/inc/" I/usr/local/cuda/include/ I. o "mfaktc.o" "mfaktc.cu.cpp" gcc fPIC o ../mfaktc.exe sieve.o mfaktc.o L/usr/local/lib L/usr/local/cuda/lib L/NVIDIA_GPU_Computing_SDK/C/lib L/NVIDIA_GPU_Computing_SDK/C/common/common/lib/linux lcudart L/usr/local/cuda/lib L/NVIDIA_GPU_Computing_SDK/C/lib L/NVIDIA_GPU_Computing_SDK/C/common/lib/linux lcufft lm cd .. rm compile_bla_bla rf Code:
$ time ./mfaktc.exe 66362159 64 65 mfaktc v0.01 C... ... no factor for M66362159 from 2^64 to 2^65 bits tf(): total time spent: 273291msec real 4m33.374s user 4m31.081s sys 0m2.304s $ time ./mfaktc.exe 66362159 64 65 & $ time ./mfaktc.exe 66362159 64 65 & ... no factor for M66362159 from 2^64 to 2^65 bits tf(): total time spent: 274948msec real 4m35.055s user 4m31.613s sys 0m3.392s class 417: tested 265712378014859264 candidates in 12176232284160ms (93725704046247936/sec) no factor for M66362159 from 2^64 to 2^65 bits tf(): total time spent: 275090msec real 4m35.173s user 4m31.745s sys 0m3.356s 
20100111, 11:46  #50  
"Oliver"
Mar 2005
Germany
3^{3}×41 Posts 
Thank you, msft!
Actually I was asking for this: Code:
Compiletime Options THREADS_PER_GRID 1048576 THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 230945bits SIEVE_PRIMES 250000 USE_PINNED_MEMORY enabled USE_ASYNC_COPY enabled VERBOSE_TIMING disabled SELFTEST disabled MORE_CLASSES disabled 275s for 2 times from 2^64 to 2^65 of M66362159 looks reasonable (still a little bit CPUlimited). My 275GTX paired with a fast Core 2 Duo does is in ~220 seconds. Quote:
Can you edit mfaktc.cu line 615: replace Code:
printf("class %4d: tested... Code:
printf("class %4Lu: tested... Code:
./mfaktc.exe 66362159 1 64 mfaktc v0.01 ... Compiletime Options THREADS_PER_GRID 1048576 THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 230945bits SIEVE_PRIMES 250000 USE_PINNED_MEMORY enabled USE_ASYNC_COPY enabled VERBOSE_TIMING disabled SELFTEST disabled MORE_CLASSES disabled tf(66362159, 1, 64); k_min = 0 k_max = 138985412407 sieve_init(): sieving factor candidates with small primes up to 3497867 class 0: tested 54525952 candidates in 9014ms (6049029/sec) class 4: tested 54525952 candidates in 9014ms (6049029/sec) ... class 49: tested 54525952 candidates in 9014ms (6049029/sec) Result[00]: M66362159 has a factor: 6901664537 ... class 61: tested 54525952 candidates in 9014ms (6049029/sec) Result[00]: M66362159 has a factor: 9157977943 ... class 301: tested 54525952 candidates in 9015ms (6048358/sec) Result[00]: M66362159 has a factor: 124246422648815633 ... class 417: tested 54525952 candidates in 9014ms (6049029/sec) found 3 factors for M66362159 with 1 to 64 bits tf(): total time spent: 891193msec If you want to spent more time on this: please edit params.h and enable "SELFTEST" and "MORE_CLASSES" (remove // from the defines). It should find one factor per mersenne number (check results.txt after the run). Last fiddled with by TheJudger on 20100111 at 11:52 

20100111, 12:30  #51  
Jul 2009
Tokyo
262_{16} Posts 
Hi,
Code:
Compiletime Options THREADS_PER_GRID 1048576 THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 230945bits SIEVE_PRIMES 50000 USE_PINNED_MEMORY enabled USE_ASYNC_COPY enabled VERBOSE_TIMING disabled SELFTEST disabled MORE_CLASSES disabled Quote:
Code:
$ cat results.txt no factor for M66362159 from 2^64 to 2^65 bits no factor for M66362159 from 2^64 to 2^65 bits no factor for M66362159 from 2^64 to 2^65 bits no factor for M66362159 from 2^64 to 2^65 bits no factor for M66362159 from 2^64 to 2^65 bits no factor for M66362159 from 2^64 to 2^65 bits M50804297 has a factor: 180620316395899877719 M50725243 has a factor: 230316474510833959177 M49635893 has a factor: 280164061095680036711 M51332417 has a factor: 297892586972172587537 M51413951 has a factor: 317216341513975685569 M51265327 has a factor: 348552331323478392193 M50787953 has a factor: 408564895570348290031 M51161503 has a factor: 415469688496323219041 M51061601 has a factor: 427900063728254374393 M51082547 has a factor: 465935689349117544521 M51437311 has a factor: 503858403232211768047 M51486859 has a factor: 510284989447684180297 M51408359 has a factor: 522238472503709826367 M51532279 has a factor: 541792563550794873377 M50751637 has a factor: 550221472071174741833 M51302663 has a factor: 603656963178941666303 M51163433 has a factor: 684192107898332819377 M50896831 has a factor: 705640111241611518359 M51375383 has a factor: 713108825973682051703 M51133343 has a factor: 796838010410767671769 M51023447 has a factor: 931398820964215340641 M50863909 has a factor: 959145688648033584641 M50920721 has a factor: 1253793135671017237321 M48630643 has a factor: 1396673413347982098001 M51250613 has a factor: 1412902407482377985447 M51406301 has a factor: 1426645377855974696807 M50893061 has a factor: 1441854080374870808777 M50979079 has a factor: 1443184588520125697329 M51064417 has a factor: 1464103704184177492831 M51293899 has a factor: 1595148557829097879457 M51132959 has a factor: 1609354388906437820393 M51125413 has a factor: 1754609807377017622201 M50781589 has a factor: 1771605458538879435223 M51321659 has a factor: 1782972607557912437543 M49715873 has a factor: 2029034084175690064751 M49915309 has a factor: 2085962683046854861393 M51152869 has a factor: 2105744115640061414321 M50909147 has a factor: 2218183397480493562177 M51340871 has a factor: 2283988614248258513047 M47644171 has a factor: 2357049767161724465927 

20100111, 12:55  #52 
"Oliver"
Mar 2005
Germany
1107_{10} Posts 
Thank you!
This was with the modified printf in mfaktc.cu line 615, right? results.txt and the screen output (typescript.gz) are as expected. :) I just noticed another bug. Look at results.txt: Code:
no factor for M66362159 from 2^64 to 2^65 bits 
20100111, 13:14  #53 
Jul 2009
Tokyo
2·5·61 Posts 

20100111, 18:20  #54  
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
17×337 Posts 
Quote:
the binary works fine. However running it makes my pc respond slowly and it becomes almost unusable. Have you any suggestions to cure this? Would increasing the sieve bound to make it cpu bound help? It seems to respond every time it moves onto the next class. Here is a benchmark which is the same as the first one in #49. Code:
time ./mfaktc.exe 66362159 64 65 mfaktc v0.01 Copyright (C) 2009, 2010 Oliver Weihe (o.weihe@tonline.de) This program comes with ABSOLUTELY NO WARRANTY; for details see COPYING. This is free software, and you are welcome to redistribute it under certain conditions; see COPYING for details. Compiletime Options THREADS_PER_GRID 1048576 THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 230945bits SIEVE_PRIMES 50000 USE_PINNED_MEMORY enabled USE_ASYNC_COPY enabled VERBOSE_TIMING disabled SELFTEST disabled MORE_CLASSES disabled tf(66362159, 64, 65); k_min = 138985412160 k_max = 277970824814 sieve_init(): sieving factor candidates with small primes up to 611957 class 0: tested 61865984 candidates in 8318ms (7437603/sec) class 4: tested 61865984 candidates in 8311ms (7443867/sec) class 9: tested 61865984 candidates in 8311ms (7443867/sec) class 12: tested 61865984 candidates in 8314ms (7441181/sec) class 16: tested 61865984 candidates in 8307ms (7447452/sec) class 21: tested 61865984 candidates in 8315ms (7440286/sec) class 24: tested 61865984 candidates in 8303ms (7451039/sec) class 25: tested 61865984 candidates in 8312ms (7442972/sec) class 37: tested 61865984 candidates in 8311ms (7443867/sec) class 40: tested 61865984 candidates in 8311ms (7443867/sec) class 45: tested 61865984 candidates in 8311ms (7443867/sec) class 49: tested 61865984 candidates in 8312ms (7442972/sec) class 52: tested 61865984 candidates in 8313ms (7442076/sec) class 60: tested 61865984 candidates in 8319ms (7436709/sec) class 61: tested 61865984 candidates in 8316ms (7439392/sec) class 69: tested 61865984 candidates in 8309ms (7445659/sec) class 72: tested 61865984 candidates in 8304ms (7450142/sec) class 76: tested 61865984 candidates in 8309ms (7445659/sec) class 81: tested 61865984 candidates in 8316ms (7439392/sec) class 84: tested 61865984 candidates in 8317ms (7438497/sec) class 96: tested 61865984 candidates in 8314ms (7441181/sec) class 97: tested 61865984 candidates in 8314ms (7441181/sec) class 100: tested 61865984 candidates in 8314ms (7441181/sec) class 105: tested 61865984 candidates in 8317ms (7438497/sec) class 109: tested 61865984 candidates in 8311ms (7443867/sec) class 112: tested 61865984 candidates in 8314ms (7441181/sec) class 117: tested 61865984 candidates in 8318ms (7437603/sec) class 121: tested 61865984 candidates in 8314ms (7441181/sec) class 124: tested 61865984 candidates in 8303ms (7451039/sec) class 129: tested 61865984 candidates in 8308ms (7446555/sec) class 132: tested 61865984 candidates in 68309ms (905678/sec) class 136: tested 61865984 candidates in 68353ms (905095/sec) class 144: tested 61865984 candidates in 8321ms (7434921/sec) class 145: tested 61865984 candidates in 8317ms (7438497/sec) class 156: tested 61865984 candidates in 8309ms (7445659/sec) class 157: tested 61865984 candidates in 8313ms (7442076/sec) class 160: tested 61865984 candidates in 8315ms (7440286/sec) class 165: tested 61865984 candidates in 8313ms (7442076/sec) class 172: tested 61865984 candidates in 8310ms (7444763/sec) class 177: tested 61865984 candidates in 8315ms (7440286/sec) class 180: tested 61865984 candidates in 8313ms (7442076/sec) class 181: tested 61865984 candidates in 8310ms (7444763/sec) class 184: tested 61865984 candidates in 8316ms (7439392/sec) class 189: tested 61865984 candidates in 8305ms (7449245/sec) class 192: tested 61865984 candidates in 8308ms (7446555/sec) class 196: tested 61865984 candidates in 8316ms (7439392/sec) class 201: tested 61865984 candidates in 8314ms (7441181/sec) class 205: tested 61865984 candidates in 8313ms (7442076/sec) class 216: tested 61865984 candidates in 8308ms (7446555/sec) class 217: tested 61865984 candidates in 8315ms (7440286/sec) class 220: tested 61865984 candidates in 8313ms (7442076/sec) class 229: tested 61865984 candidates in 8307ms (7447452/sec) class 237: tested 61865984 candidates in 8315ms (7440286/sec) class 240: tested 61865984 candidates in 8303ms (7451039/sec) class 241: tested 61865984 candidates in 8311ms (7443867/sec) class 244: tested 61865984 candidates in 8315ms (7440286/sec) class 249: tested 61865984 candidates in 8317ms (7438497/sec) class 252: tested 61865984 candidates in 8311ms (7443867/sec) class 256: tested 61865984 candidates in 8313ms (7442076/sec) class 261: tested 61865984 candidates in 8316ms (7439392/sec) class 264: tested 61865984 candidates in 8307ms (7447452/sec) class 265: tested 61865984 candidates in 8316ms (7439392/sec) class 276: tested 61865984 candidates in 8303ms (7451039/sec) class 277: tested 61865984 candidates in 8314ms (7441181/sec) class 280: tested 61865984 candidates in 8311ms (7443867/sec) class 285: tested 61865984 candidates in 8316ms (7439392/sec) class 289: tested 61865984 candidates in 8317ms (7438497/sec) class 292: tested 61865984 candidates in 8313ms (7442076/sec) class 297: tested 61865984 candidates in 8314ms (7441181/sec) class 300: tested 61865984 candidates in 8314ms (7441181/sec) class 301: tested 61865984 candidates in 8317ms (7438497/sec) class 304: tested 61865984 candidates in 8309ms (7445659/sec) class 312: tested 61865984 candidates in 8317ms (7438497/sec) class 321: tested 61865984 candidates in 8313ms (7442076/sec) class 324: tested 61865984 candidates in 8316ms (7439392/sec) class 325: tested 61865984 candidates in 8315ms (7440286/sec) class 336: tested 61865984 candidates in 8313ms (7442076/sec) class 340: tested 61865984 candidates in 8313ms (7442076/sec) class 345: tested 61865984 candidates in 8316ms (7439392/sec) class 349: tested 61865984 candidates in 8312ms (7442972/sec) class 352: tested 61865984 candidates in 8318ms (7437603/sec) class 357: tested 61865984 candidates in 8314ms (7441181/sec) class 360: tested 61865984 candidates in 8313ms (7442076/sec) class 361: tested 61865984 candidates in 8315ms (7440286/sec) class 364: tested 61865984 candidates in 8317ms (7438497/sec) class 369: tested 61865984 candidates in 8313ms (7442076/sec) class 376: tested 61865984 candidates in 8315ms (7440286/sec) class 381: tested 61865984 candidates in 8315ms (7440286/sec) class 384: tested 61865984 candidates in 8314ms (7441181/sec) class 385: tested 61865984 candidates in 8316ms (7439392/sec) class 396: tested 61865984 candidates in 8313ms (7442076/sec) class 397: tested 61865984 candidates in 8319ms (7436709/sec) class 405: tested 61865984 candidates in 8317ms (7438497/sec) class 409: tested 61865984 candidates in 8310ms (7444763/sec) class 412: tested 61865984 candidates in 8312ms (7442972/sec) class 417: tested 61865984 candidates in 8312ms (7442972/sec) no factor for M66362159 from 2^64 to 2^65 bits tf(): total time spent: 922393msec real 15m22.494s user 13m25.326s sys 0m0.820s I will now have a go at compiling myself and see how i fare. 

20100111, 18:48  #55 
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
1011001100001_{2} Posts 
I just compiled successfully after changing the cuda directory in the script.
The old version of the script runs at 2/3rds the speed of the one with the hack which is the same as your compilation. I will now try with different sieve bounds. 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
mfakto: an OpenCL program for Mersenne prefactoring  Bdot  GPU Computing  1656  20201013 14:21 
The P1 factoring CUDA program  firejuggler  GPU Computing  752  20200908 16:15 
"CUDA runtime version 0.0" when running mfaktc.exe  froderik  GPU Computing  4  20161030 15:29 
World's seconddumbest CUDA program  fivemack  Programming  112  20150212 22:51 
World's dumbest CUDA program?  xilman  Programming  1  20091116 10:26 