mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2021-09-24, 03:53   #1
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

24×32×53 Posts
Default Linux memory help

I'm running prime95 on a very basic Linux install. Ubuntu 14 or 16, no GUI, nothing much running but prime95.
I'm trying to understand the random "Killed" problem during stage 2. Machine has 8GB memory, 5.5GB for mprime.

Top shows this:

Code:
Tasks: 118 total,   1 running, 117 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.2 sy, 99.7 ni,  0.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  7860544 total,   169252 free,  7455956 used,   235336 buff/cache
KiB Swap:        0 total,        0 free,        0 used.    93612 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
23137 george    30  10 6803204 6.004g   3828 S 398.3 80.1   2662:36 mprime307
Question 1: It seems I have no swapfile (makes sense as there is no hard disk, only an NFS mount). So why does VIRT (virtual memory?) show 6.8GB and RES (resident memory?) show 6GB.

Question 2: The above works OK. But if I let mprime use 6GB, VIRT goes up to almost 7GB and RES goes up to about 6.1GB. This configuration usually works. With OutputIterations set pretty low, I get output at a fairly constant 129 sec. Sometimes though mprime gets slow (200+ sec.). To me this looks like thrashing, but how can that be without a swapfile? Perhaps a system process running and the memory marked cache is converted to free memory which then gets reloaded after the system process completes?

Question 3: Mprime (and prime95) does not let the user input too large a value for memory to use. We do not want naive users (like me??) inputting a value that is apt to cause thrashing. One can always edit local.txt to put in larger values. The current formula allows up to 90% of system memory (7.2GB in my case -- clearly too much). What would be a better formula? Perhaps reserving 10% of system memory or 2.5GB whichever is larger??? Suggestions welcome.

P.S. The above was in 30.7 which has a bug where it sometimes allocates more memory than necessary. The working set obeys the memory limit but peak memory usage is higher than it needs to be. I'm working on a fix and can update these "top" numbers when fixed.

Last fiddled with by Prime95 on 2021-09-24 at 03:56
Prime95 is online now   Reply With Quote
Old 2021-09-24, 06:46   #2
Nick
 
Nick's Avatar
 
Dec 2012
The Netherlands

110110100002 Posts
Default

The information on this page is several years old but might possibly help.
Nick is offline   Reply With Quote
Old 2021-09-24, 08:31   #3
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

137310 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Question 1: It seems I have no swapfile (makes sense as there is no hard disk, only an NFS mount). So why does VIRT (virtual memory?) show 6.8GB and RES (resident memory?) show 6GB.
In my understanding, VIRT in "top" for a process indicates "virtual" memory that is not mapped to physical memory. That would happen for example after a malloc() but before writing anything to the malloc'ed range. And RES is memory mapped to physical pages. Thus, VIRT is limited by the virtual address range (huge), not by the physical memory or swap.

OTOH it's a good idea to have a swap set up, in order for it to collect the "dead" regions of allocated memory (freeing the physical memory from junk). Creating the swap does not require a swap *partition*, can be done in a file using "mkswap" and "swapon".
preda is offline   Reply With Quote
Old 2021-09-27, 02:27   #4
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

167208 Posts
Default

Update. I've worked on reducing the peak memory usage during stage 2 init in 30.7. Top now reports this with stage 2 allowed to use 6GB. Roughly the same numbers as the previous 30.7 version with 5.5GB memory allowance.

Code:
top - 22:20:00 up 321 days,  5:13,  0 users,  load average: 3.99, 3.55, 3.29
Tasks: 119 total,   1 running, 118 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.1 sy, 99.8 ni,  0.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  7860544 total,   138896 free,  7452364 used,   269284 buff/cache
KiB Swap:        0 total,        0 free,        0 used.    80212 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
25939 george    30  10 6787548 6.004g   5936 S 399.7 80.1  32:02.13 mprime307
25958 george    20   0   43340   3644   3072 R   0.3  0.0   0:00.03 top
    1 root      20   0  121836   5952   1980 S   0.0  0.1   1:47.94 systemd
I'll see if this suffers from the random "Killed" problem overnight. Next I'll try creating a small swap file.
Prime95 is online now   Reply With Quote
Old 2021-09-27, 03:39   #5
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

142108 Posts
Default

Quote:
Originally Posted by preda View Post
In my understanding, VIRT in "top" for a process indicates "virtual" memory that is not mapped to physical memory. That would happen for example after a malloc() but before writing anything to the malloc'ed range. And RES is memory mapped to physical pages. Thus, VIRT is limited by the virtual address range (huge), not by the physical memory or swap.
Yeah, that is basically correct. I've seen processes have TiB of VIRT without any issue. It can be safely ignored. But RES, that's where the action happens. You can freely allocate PiB of memory (so I have read, not tried it myself) and all is well until you try to actually use too much of it.

ETA: Linux allocations are equivalent to Windows with the MEM_RESERVE flag set, and the MEM_COMMIT flag unset. With the difference being Linux auto-commits, whereas Windows will GPF upon access.
Quote:
Originally Posted by preda View Post
OTOH it's a good idea to have a swap set up, in order for it to collect the "dead" regions of allocated memory (freeing the physical memory from junk). Creating the swap does not require a swap *partition*, can be done in a file using "mkswap" and "swapon".
IMO swap is of no value if you have nowhere to swap to.

And IMO also swap is of no value even when there is somewhere to swap to, just buy more RAM. YMMV :P

Last fiddled with by retina on 2021-09-27 at 03:46 Reason: Add Windows equivalence
retina is offline   Reply With Quote
Old 2021-09-27, 06:04   #6
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

763210 Posts
Default

More investigation. These Linux machines have 512 huge pages (which I think are 2MB each). Thus 1GB may not be accessible for P-1 stage 2.

When large page support was wedged into gwnum, the sin/cos tables and one gwnum were allocated with large pages. This was adequate in a world where LL testing was the norm.
Prime95 is online now   Reply With Quote
Old 2021-09-28, 18:13   #7
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

23×5×157 Posts
Default

Quote:
Originally Posted by retina View Post
You can freely allocate PiB of memory (so I have read, not tried it myself) ...
That last part in brackets is incorrect. I got curious so I tried it.

I allocated 2PiB of RAM on the smallest machine I could find with 4GiB installed.
Code:
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
16894 retina    20   0  0.125p   2052      4 S   0.0  0.1   0:00.10 mem_reserve_tes
16906 retina    20   0  0.125p   2052      4 S   0.0  0.1   0:00.10 mem_reserve_tes
16914 retina    20   0  0.125p   2052      4 S   0.0  0.1   0:00.10 mem_reserve_tes
16922 retina    20   0  0.125p   2052      4 S   0.0  0.1   0:00.10 mem_reserve_tes
16930 retina    20   0  0.125p   2052      4 S   0.0  0.1   0:00.10 mem_reserve_tes
16938 retina    20   0  0.125p   2052      4 S   0.0  0.1   0:00.10 mem_reserve_tes
16946 retina    20   0  0.125p   2052      4 S   0.0  0.1   0:00.11 mem_reserve_tes
16954 retina    20   0  0.125p   2052      4 S   0.0  0.1   0:00.10 mem_reserve_tes
16962 retina    20   0  0.125p   2052      4 S   0.0  0.1   0:00.10 mem_reserve_tes
16970 retina    20   0  0.125p   2052      4 S   0.0  0.1   0:00.10 mem_reserve_tes
16978 retina    20   0  0.125p   2052      4 S   0.0  0.1   0:00.10 mem_reserve_tes
16990 retina    20   0  0.125p   2052      4 S   0.0  0.1   0:00.10 mem_reserve_tes
16998 retina    20   0  0.125p   2052      4 S   0.0  0.1   0:00.10 mem_reserve_tes
17006 retina    20   0  0.125p   2052      4 S   0.0  0.1   0:00.11 mem_reserve_tes
17014 retina    20   0  0.125p   2052      4 S   0.0  0.1   0:00.10 mem_reserve_tes
17017 retina    20   0  0.125p   2052      4 S   0.0  0.1   0:00.10 mem_reserve_tes
Tasks can allocate a maximum of 127TiB each.
Code:
~ for x in {1..16} ; do ./mem_reserve_test & sleep 1 ; done
[1] 16894
7ffec0000000
[2] 16906
7ffec0000000
[3] 16914
7ffec0000000
[4] 16922
7ffec0000000
[5] 16930
7ffec0000000
[6] 16938
7ffec0000000
[7] 16946
7ffec0000000
[8] 16954
7ffec0000000
[9] 16962
7ffec0000000
[10] 16970
7ffec0000000
[11] 16978
7ffec0000000
[12] 16990
7ffec0000000
[13] 16998
7ffec0000000
[14] 17006
7ffec0000000
[15] 17014
7ffec0000000
[16] 17017
7ffec0000000
So that is the full 47 bits of address space maxed out, 16 times over.

And the test code in case anyone cares to try it.
Code:
ALLOC_START		= 1 shl 32
ALLOC_STEPS		= 1 shl 30

format elf64 executable 0 at 1 shl 16
entry main

segment executable readable

HANDLE_STD_OUTPUT	= 1

; see /usr/include/asm/unistd_64.h
SYS64_WRITE		= 1
SYS64_MMAP		= 9
SYS64_NANOSLEEP		= 35
SYS64_EXIT		= 60

MMAP_PROT_READ		= 0x1
MMAP_PROT_WRITE		= 0x2
MMAP_MAP_PRIVATE	= 0x2
MMAP_MAP_FIXED		= 0x10
MMAP_MAP_ANONYMOUS	= 0x20
MMAP_MAP_FIXED_NOREPLACE= 0x100000

main:
	mov	r15,ALLOC_START
	mov	r14,ALLOC_STEPS
    .alloc_loop:
	xor	r9,r9			;offset
	or	r8,-1			;fd
	mov	r10,MMAP_MAP_PRIVATE or MMAP_MAP_ANONYMOUS or MMAP_MAP_FIXED or MMAP_MAP_FIXED_NOREPLACE
	mov	edx,MMAP_PROT_READ or MMAP_PROT_WRITE
	mov	rsi,r14			;size
	mov	rdi,r15			;address
	mov	eax,SYS64_MMAP
	syscall
	cmp	rax,r15
	jnz	.done
	add	r15,r14
	jmp	.alloc_loop
    .done:
	mov	rax,-ALLOC_START
	add	rax,r15
	call	print_hex
	mov	al,10
	call	print_char
	mov	eax,60
	call	sleep
	mov	rax,SYS64_EXIT
	xor	edi,edi
	syscall

print_hex:
	mov	rdi,rsp
	sub	rsp,32
	dec	rdi
	mov	byte[rdi],0
	mov	ecx,16
    .next_digit:
	xor	edx,edx
	div	rcx
	xchg	rdx,rax
	lea	ebx,[eax+'a'-10]
	add	al,'0'
	cmp	al,'9'
	cmova	eax,ebx
	dec	rdi
	mov	[rdi],al
	test	rdx,rdx
	mov	rax,rdx
	jnz	.next_digit
	mov	rax,rdi
	call	print_string
	add	rsp,32
	ret

print_string:
	mov	rdi,rax
    .next_char:
	mov	al,[rdi]
	test	al,al
	jz	.done
	push	rdi
	call	print_char
	pop	rdi
	inc	rdi
	jmp	.next_char
    .done:
	ret

print_char:
	push	rax
	mov	eax,SYS64_WRITE
	mov	edi,HANDLE_STD_OUTPUT
	mov	rsi,rsp
	mov	edx,1
	syscall
	pop	rax
	ret

sleep:
	push	0 rax
	mov	eax,SYS64_NANOSLEEP
	mov	rdi,rsp
	xor	esi,esi
	syscall
	pop	rax rax
	ret
Old school assembly with no macros or other modern rubbish like calling standards or anything. You'll need fasm to compile it. It has a whopping 350 byte executable size.
retina is offline   Reply With Quote
Old 2021-09-28, 19:44   #8
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

24×32×53 Posts
Default

Quote:
Originally Posted by retina View Post
Old school assembly with no macros .
Manly programming

More on my problem. Latest change was killed again. Found that writing a save file during stage 2 requires 2 more temporaries (~100MB). I've written a "fix" that immediately frees that memory back to the OS. So far, top is reporting a pretty steady 6.01GB and haven't been killed in the last 12 hours.

I do have an idea to reduce the cost of a save file create to one gwnum or perhaps even less (~12MB). Not going to happen for 30.7.
Prime95 is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Windows Subsystem for Linux v2 gets real Linux kernel tServo Software 0 2019-05-07 16:59
'Kernel memory leaking' Intel processor design flaw forces Linux, Windows redesign M344587487 Science & Technology 42 2018-11-17 13:07
"Hybrid Memory Cube" offers 1 Tb/s memory bandwith at just 1.4 mW/Gb/s ixfd64 Hardware 4 2011-12-14 21:24
Getting memory use of current process (linux)? Jushi Programming 12 2006-11-13 08:52
Use of large memory pages possible with newer linux kernels Dresdenboy Software 3 2003-12-08 14:47

All times are UTC. The time now is 03:18.


Tue Oct 19 03:18:34 UTC 2021 up 87 days, 21:47, 0 users, load averages: 1.62, 1.47, 1.52

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.