![]() |
|
|
#45 | |||
|
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
3×23×89 Posts |
Quote:
![]() Quote:
![]() Quote:
. I often end up causing more trouble than the time it would take for you to post binaries. My platform is a Q6600. I have the CUDA software and will attempt to compile tomorrow on Ubuntu 9.04-64bit. Hopefully it will go smoothly.
|
|||
|
|
|
|
|
#46 |
|
"Oliver"
Mar 2005
Germany
111510 Posts |
Hi Henry,
try this one. I won't be surprised if it doesn't work (libraries versions, ...) I have mistyped the model name of the 8600, I have a 8600GT here, not a 8600GTS. The GTS is faster. Oliver |
|
|
|
|
|
#47 |
|
Jul 2009
Tokyo
2×5×61 Posts |
Hi,
On ubuntu9.04/32bit/GTX260 Code:
$ time ./mfaktc.exe 66362159 64 65 mfaktc v0.01 C... ... no factor for M66362159 from 2^64 to 2^65 bits tf(): total time spent: 273133msec real 4m33.207s user 4m30.925s sys 0m2.288s |
|
|
|
|
|
#48 |
|
"Oliver"
Mar 2005
Germany
5·223 Posts |
Hi msft,
can you post the compiletime options and your CPU (Q8400?), too? I'm pretty sure this run was CPU-limited. If you want to try to run 2 (or maybe even 3) processes at the same time (in different directories because both processes try to access results.txt). |
|
|
|
|
|
#49 |
|
Jul 2009
Tokyo
2×5×61 Posts |
Yes Q8400.
Code:
#!/bin/bash -x mkdir compile_bla_bla cd compile_bla_bla gcc -Wall -O2 -c ../sieve.c -o sieve.o nvcc -c ../mfaktc.cu -o mfaktc.o -I /NVIDIA_GPU_Computing_SDK/C/common/inc/ --ptxas-options=-v --keep -DMUL24HI mv mfaktc.ptx mfaktc.ptx.old cat mfaktc.ptx.old | sed s/mul\.hi\.u32/mul24\.hi\.u32/ > mfaktc.ptx rm -f mfaktc.sm_10.cubin mfaktc.cu.cpp mfaktc.o ptxas --key="xxxxxxxxxx" -arch=sm_10 -v "mfaktc.ptx" -o "mfaktc.sm_10.cubin" fatbin --key="xxxxxxxxxx" --source-name="../mfaktc.cu" --usage-mode="-v " --embedded-fatbin="mfaktc.fatbin.c" "--image=profile=sm_10,file=mfaktc.sm_10.cubin" "--image=profile=compute_10,file=mfaktc.ptx" cudafe++ --gnu_version=40302 --diag_error=host_device_limited_call --diag_error=ms_asm_decl_not_allowed --parse_templates --gen_c_file_name "mfaktc.cudafe1.cpp" --stub_file_name "mfaktc.cudafe1.stub.c" --stub_header_file_name "mfaktc.cudafe1.stub.h" "mfaktc.cpp1.ii" gcc -D__CUDA_ARCH__=100 -E -x c++ -DCUDA_NO_SM_12_ATOMIC_INTRINSICS -DCUDA_NO_SM_13_DOUBLE_INTRINSICS -DCUDA_FLOAT_MATH_FUNCTIONS -DCUDA_NO_SM_11_ATOMIC_INTRINSICS "-I /NVIDIA_GPU_Computing_SDK/C/common/inc/" -I/usr/local/cuda/include/ -I. -o "mfaktc.cu.cpp" "mfaktc.cudafe1.cpp" gcc -c -x c++ "-I /NVIDIA_GPU_Computing_SDK/C/common/inc/" -I/usr/local/cuda/include/ -I. -o "mfaktc.o" "mfaktc.cu.cpp" gcc -fPIC -o ../mfaktc.exe sieve.o mfaktc.o -L/usr/local/lib -L/usr/local/cuda/lib -L/NVIDIA_GPU_Computing_SDK/C/lib -L/NVIDIA_GPU_Computing_SDK/C/common/common/lib/linux -lcudart -L/usr/local/cuda/lib -L/NVIDIA_GPU_Computing_SDK/C/lib -L/NVIDIA_GPU_Computing_SDK/C/common/lib/linux -lcufft -lm cd .. rm compile_bla_bla -rf Code:
$ time ./mfaktc.exe 66362159 64 65 mfaktc v0.01 C... ... no factor for M66362159 from 2^64 to 2^65 bits tf(): total time spent: 273291msec real 4m33.374s user 4m31.081s sys 0m2.304s $ time ./mfaktc.exe 66362159 64 65 & $ time ./mfaktc.exe 66362159 64 65 & ... no factor for M66362159 from 2^64 to 2^65 bits tf(): total time spent: 274948msec real 4m35.055s user 4m31.613s sys 0m3.392s class 417: tested 265712378014859264 candidates in 12176232284160ms (93725704046247936/sec) no factor for M66362159 from 2^64 to 2^65 bits tf(): total time spent: 275090msec real 4m35.173s user 4m31.745s sys 0m3.356s |
|
|
|
|
|
#50 | |
|
"Oliver"
Mar 2005
Germany
45B16 Posts |
Thank you, msft!
Actually I was asking for this: Code:
Compiletime Options THREADS_PER_GRID 1048576 THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 230945bits SIEVE_PRIMES 250000 USE_PINNED_MEMORY enabled USE_ASYNC_COPY enabled VERBOSE_TIMING disabled SELFTEST disabled MORE_CLASSES disabled 275s for 2 times from 2^64 to 2^65 of M66362159 looks reasonable (still a little bit CPU-limited). My 275GTX paired with a fast Core 2 Duo does is in ~220 seconds. Quote:
Can you edit mfaktc.cu line 615: replace Code:
printf("class %4d: tested...
Code:
printf("class %4Lu: tested...
Code:
./mfaktc.exe 66362159 1 64 mfaktc v0.01 ... Compiletime Options THREADS_PER_GRID 1048576 THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 230945bits SIEVE_PRIMES 250000 USE_PINNED_MEMORY enabled USE_ASYNC_COPY enabled VERBOSE_TIMING disabled SELFTEST disabled MORE_CLASSES disabled tf(66362159, 1, 64); k_min = 0 k_max = 138985412407 sieve_init(): sieving factor candidates with small primes up to 3497867 class 0: tested 54525952 candidates in 9014ms (6049029/sec) class 4: tested 54525952 candidates in 9014ms (6049029/sec) ... class 49: tested 54525952 candidates in 9014ms (6049029/sec) Result[00]: M66362159 has a factor: 6901664537 ... class 61: tested 54525952 candidates in 9014ms (6049029/sec) Result[00]: M66362159 has a factor: 9157977943 ... class 301: tested 54525952 candidates in 9015ms (6048358/sec) Result[00]: M66362159 has a factor: 124246422648815633 ... class 417: tested 54525952 candidates in 9014ms (6049029/sec) found 3 factors for M66362159 with 1 to 64 bits tf(): total time spent: 891193msec If you want to spent more time on this: please edit params.h and enable "SELFTEST" and "MORE_CLASSES" (remove // from the defines). It should find one factor per mersenne number (check results.txt after the run). Last fiddled with by TheJudger on 2010-01-11 at 11:52 |
|
|
|
|
|
|
#51 | |
|
Jul 2009
Tokyo
2×5×61 Posts |
Hi,
Code:
Compiletime Options THREADS_PER_GRID 1048576 THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 230945bits SIEVE_PRIMES 50000 USE_PINNED_MEMORY enabled USE_ASYNC_COPY enabled VERBOSE_TIMING disabled SELFTEST disabled MORE_CLASSES disabled Quote:
Code:
$ cat results.txt no factor for M66362159 from 2^64 to 2^65 bits no factor for M66362159 from 2^64 to 2^65 bits no factor for M66362159 from 2^64 to 2^65 bits no factor for M66362159 from 2^64 to 2^65 bits no factor for M66362159 from 2^64 to 2^65 bits no factor for M66362159 from 2^64 to 2^65 bits M50804297 has a factor: 180620316395899877719 M50725243 has a factor: 230316474510833959177 M49635893 has a factor: 280164061095680036711 M51332417 has a factor: 297892586972172587537 M51413951 has a factor: 317216341513975685569 M51265327 has a factor: 348552331323478392193 M50787953 has a factor: 408564895570348290031 M51161503 has a factor: 415469688496323219041 M51061601 has a factor: 427900063728254374393 M51082547 has a factor: 465935689349117544521 M51437311 has a factor: 503858403232211768047 M51486859 has a factor: 510284989447684180297 M51408359 has a factor: 522238472503709826367 M51532279 has a factor: 541792563550794873377 M50751637 has a factor: 550221472071174741833 M51302663 has a factor: 603656963178941666303 M51163433 has a factor: 684192107898332819377 M50896831 has a factor: 705640111241611518359 M51375383 has a factor: 713108825973682051703 M51133343 has a factor: 796838010410767671769 M51023447 has a factor: 931398820964215340641 M50863909 has a factor: 959145688648033584641 M50920721 has a factor: 1253793135671017237321 M48630643 has a factor: 1396673413347982098001 M51250613 has a factor: 1412902407482377985447 M51406301 has a factor: 1426645377855974696807 M50893061 has a factor: 1441854080374870808777 M50979079 has a factor: 1443184588520125697329 M51064417 has a factor: 1464103704184177492831 M51293899 has a factor: 1595148557829097879457 M51132959 has a factor: 1609354388906437820393 M51125413 has a factor: 1754609807377017622201 M50781589 has a factor: 1771605458538879435223 M51321659 has a factor: 1782972607557912437543 M49715873 has a factor: 2029034084175690064751 M49915309 has a factor: 2085962683046854861393 M51152869 has a factor: 2105744115640061414321 M50909147 has a factor: 2218183397480493562177 M51340871 has a factor: 2283988614248258513047 M47644171 has a factor: 2357049767161724465927 |
|
|
|
|
|
|
#52 |
|
"Oliver"
Mar 2005
Germany
100010110112 Posts |
Thank you!
This was with the modified printf in mfaktc.cu line 615, right? results.txt and the screen output (typescript.gz) are as expected. :) I just noticed another bug. Look at results.txt: Code:
no factor for M66362159 from 2^64 to 2^65 bits |
|
|
|
|
|
#53 |
|
Jul 2009
Tokyo
2×5×61 Posts |
|
|
|
|
|
|
#54 | |
|
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
3×23×89 Posts |
Quote:
the binary works fine. However running it makes my pc respond slowly and it becomes almost unusable. Have you any suggestions to cure this? Would increasing the sieve bound to make it cpu bound help? It seems to respond every time it moves onto the next class. Here is a benchmark which is the same as the first one in #49. Code:
time ./mfaktc.exe 66362159 64 65 mfaktc v0.01 Copyright (C) 2009, 2010 Oliver Weihe (o.weihe@t-online.de) This program comes with ABSOLUTELY NO WARRANTY; for details see COPYING. This is free software, and you are welcome to redistribute it under certain conditions; see COPYING for details. Compiletime Options THREADS_PER_GRID 1048576 THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 230945bits SIEVE_PRIMES 50000 USE_PINNED_MEMORY enabled USE_ASYNC_COPY enabled VERBOSE_TIMING disabled SELFTEST disabled MORE_CLASSES disabled tf(66362159, 64, 65); k_min = 138985412160 k_max = 277970824814 sieve_init(): sieving factor candidates with small primes up to 611957 class 0: tested 61865984 candidates in 8318ms (7437603/sec) class 4: tested 61865984 candidates in 8311ms (7443867/sec) class 9: tested 61865984 candidates in 8311ms (7443867/sec) class 12: tested 61865984 candidates in 8314ms (7441181/sec) class 16: tested 61865984 candidates in 8307ms (7447452/sec) class 21: tested 61865984 candidates in 8315ms (7440286/sec) class 24: tested 61865984 candidates in 8303ms (7451039/sec) class 25: tested 61865984 candidates in 8312ms (7442972/sec) class 37: tested 61865984 candidates in 8311ms (7443867/sec) class 40: tested 61865984 candidates in 8311ms (7443867/sec) class 45: tested 61865984 candidates in 8311ms (7443867/sec) class 49: tested 61865984 candidates in 8312ms (7442972/sec) class 52: tested 61865984 candidates in 8313ms (7442076/sec) class 60: tested 61865984 candidates in 8319ms (7436709/sec) class 61: tested 61865984 candidates in 8316ms (7439392/sec) class 69: tested 61865984 candidates in 8309ms (7445659/sec) class 72: tested 61865984 candidates in 8304ms (7450142/sec) class 76: tested 61865984 candidates in 8309ms (7445659/sec) class 81: tested 61865984 candidates in 8316ms (7439392/sec) class 84: tested 61865984 candidates in 8317ms (7438497/sec) class 96: tested 61865984 candidates in 8314ms (7441181/sec) class 97: tested 61865984 candidates in 8314ms (7441181/sec) class 100: tested 61865984 candidates in 8314ms (7441181/sec) class 105: tested 61865984 candidates in 8317ms (7438497/sec) class 109: tested 61865984 candidates in 8311ms (7443867/sec) class 112: tested 61865984 candidates in 8314ms (7441181/sec) class 117: tested 61865984 candidates in 8318ms (7437603/sec) class 121: tested 61865984 candidates in 8314ms (7441181/sec) class 124: tested 61865984 candidates in 8303ms (7451039/sec) class 129: tested 61865984 candidates in 8308ms (7446555/sec) class 132: tested 61865984 candidates in 68309ms (905678/sec) class 136: tested 61865984 candidates in 68353ms (905095/sec) class 144: tested 61865984 candidates in 8321ms (7434921/sec) class 145: tested 61865984 candidates in 8317ms (7438497/sec) class 156: tested 61865984 candidates in 8309ms (7445659/sec) class 157: tested 61865984 candidates in 8313ms (7442076/sec) class 160: tested 61865984 candidates in 8315ms (7440286/sec) class 165: tested 61865984 candidates in 8313ms (7442076/sec) class 172: tested 61865984 candidates in 8310ms (7444763/sec) class 177: tested 61865984 candidates in 8315ms (7440286/sec) class 180: tested 61865984 candidates in 8313ms (7442076/sec) class 181: tested 61865984 candidates in 8310ms (7444763/sec) class 184: tested 61865984 candidates in 8316ms (7439392/sec) class 189: tested 61865984 candidates in 8305ms (7449245/sec) class 192: tested 61865984 candidates in 8308ms (7446555/sec) class 196: tested 61865984 candidates in 8316ms (7439392/sec) class 201: tested 61865984 candidates in 8314ms (7441181/sec) class 205: tested 61865984 candidates in 8313ms (7442076/sec) class 216: tested 61865984 candidates in 8308ms (7446555/sec) class 217: tested 61865984 candidates in 8315ms (7440286/sec) class 220: tested 61865984 candidates in 8313ms (7442076/sec) class 229: tested 61865984 candidates in 8307ms (7447452/sec) class 237: tested 61865984 candidates in 8315ms (7440286/sec) class 240: tested 61865984 candidates in 8303ms (7451039/sec) class 241: tested 61865984 candidates in 8311ms (7443867/sec) class 244: tested 61865984 candidates in 8315ms (7440286/sec) class 249: tested 61865984 candidates in 8317ms (7438497/sec) class 252: tested 61865984 candidates in 8311ms (7443867/sec) class 256: tested 61865984 candidates in 8313ms (7442076/sec) class 261: tested 61865984 candidates in 8316ms (7439392/sec) class 264: tested 61865984 candidates in 8307ms (7447452/sec) class 265: tested 61865984 candidates in 8316ms (7439392/sec) class 276: tested 61865984 candidates in 8303ms (7451039/sec) class 277: tested 61865984 candidates in 8314ms (7441181/sec) class 280: tested 61865984 candidates in 8311ms (7443867/sec) class 285: tested 61865984 candidates in 8316ms (7439392/sec) class 289: tested 61865984 candidates in 8317ms (7438497/sec) class 292: tested 61865984 candidates in 8313ms (7442076/sec) class 297: tested 61865984 candidates in 8314ms (7441181/sec) class 300: tested 61865984 candidates in 8314ms (7441181/sec) class 301: tested 61865984 candidates in 8317ms (7438497/sec) class 304: tested 61865984 candidates in 8309ms (7445659/sec) class 312: tested 61865984 candidates in 8317ms (7438497/sec) class 321: tested 61865984 candidates in 8313ms (7442076/sec) class 324: tested 61865984 candidates in 8316ms (7439392/sec) class 325: tested 61865984 candidates in 8315ms (7440286/sec) class 336: tested 61865984 candidates in 8313ms (7442076/sec) class 340: tested 61865984 candidates in 8313ms (7442076/sec) class 345: tested 61865984 candidates in 8316ms (7439392/sec) class 349: tested 61865984 candidates in 8312ms (7442972/sec) class 352: tested 61865984 candidates in 8318ms (7437603/sec) class 357: tested 61865984 candidates in 8314ms (7441181/sec) class 360: tested 61865984 candidates in 8313ms (7442076/sec) class 361: tested 61865984 candidates in 8315ms (7440286/sec) class 364: tested 61865984 candidates in 8317ms (7438497/sec) class 369: tested 61865984 candidates in 8313ms (7442076/sec) class 376: tested 61865984 candidates in 8315ms (7440286/sec) class 381: tested 61865984 candidates in 8315ms (7440286/sec) class 384: tested 61865984 candidates in 8314ms (7441181/sec) class 385: tested 61865984 candidates in 8316ms (7439392/sec) class 396: tested 61865984 candidates in 8313ms (7442076/sec) class 397: tested 61865984 candidates in 8319ms (7436709/sec) class 405: tested 61865984 candidates in 8317ms (7438497/sec) class 409: tested 61865984 candidates in 8310ms (7444763/sec) class 412: tested 61865984 candidates in 8312ms (7442972/sec) class 417: tested 61865984 candidates in 8312ms (7442972/sec) no factor for M66362159 from 2^64 to 2^65 bits tf(): total time spent: 922393msec real 15m22.494s user 13m25.326s sys 0m0.820s I will now have a go at compiling myself and see how i fare. |
|
|
|
|
|
|
#55 |
|
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
3·23·89 Posts |
I just compiled successfully after changing the cuda directory in the script.
The old version of the script runs at 2/3rds the speed of the one with the hack which is the same as your compilation. I will now try with different sieve bounds.
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1724 | 2023-06-04 23:31 |
| gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 42 | 2022-12-18 05:59 |
| The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
| mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |