mmap may slow down your Go app
Do you use syscall.Mmap in Go? There are high chances the answer is yes
even if you are unaware of it. Your app's direct or indirect dependencies may use syscall.Mmap
because of a widespread beleif - mmap
is faster than plain old file I/O. Let's try to understand whether this is true.
What is mmap
?
mmap is a system call for mapping file contents into memory. After the mapping, you can read and/or write file contents by just accessing memory region returned by the syscall. Convenient, isn’t it? There is no need in heavy
system calls for reading and/or writing the file contents. Win-win? No!
How does mmap
work?
What happens when a program accesses memory address inside the region returned by mmap
? There are two cases:
- The given memory address points to
hot
data already present in memory. Such memory is known as page cache. In this case the access may be indeed faster than the access viaread
/write
syscall. - The given memory address points to
cold
data missing in thepage cache
. In this case the Operating Sytem (OS) intercepts the memory access via Major page fault, loads the requested data from the mmap'ed file intopage cache
and then returns the control to the program. All thismagic
is invisible to the program - it just accesses data at the specified memory location as usual. But it has very high price -cold
data access requires100000x
more time than thehot
data access. Why? See Latency Numbers Every Programmer Should Know.
I hear your voices — “read
/ write
syscalls has almost the same price for cold
data access as major page faults
for mmap
ed file - the implicit memory interception is just substituted by an explicit system call". Yes. But let's take closer look at Go runtime.
What’s wrong with mmap
in Go?
Go runs goroutines on OS threads. GOMAXPROCS goroutines can be executed simultaneously by OS threads. Other ready goroutines wait for their turn until the currently running goroutines block, yield or stuck in cgo call / syscall. Goroutines may block on I/O, channel, mutex. Goroutines may yield on function call, memory allocation or explicit runtime.Gosched call. Goroutines don’t block on a major page fault
!
Again — goroutines don’t block and don’t yield on a major page fault
, since it is invisible to Go runtime. What happens when a goroutine accesses cold
data in mmap
ed file? It stucks for a looooong time. During this time it occupies an OS thread from GOMAXPROCS
threads, so other ready goroutines have reduced number of threads to be executed on. This leads to CPU under-utilization. What happens if GOMAXPROCS
goroutines concurrently access cold
data regions in mmap
ed file? Complete stall of the whole program until the OS resolves major page faults
caused by these goroutines!
How to detect stalls caused by major page faults
in Go programs?
Monitor request latencies and CPU usage:
- latencies for ALL the requests usually increase during stalls;
user
CPU share drops, since the program performs less work during stalls;system
andiowait
CPU shares increase because the OS handlesmajor page faults
.
How to deal with these stalls?
- Increase
GOMAXPROCS
toN
x NumCPU. This reduces chances of CPU under-utilization duringmajor page faults
at the cost of higher CPU usage, since now each CPU core deals with multiple OS threads. - Access
mmap
ed data only viacgo
calls. Go launches an additional OS thread for each goroutine stuck insidecgo
call. This prevents from CPU under-utilization at the cost of higher CPU usage, sincecgo
calls are expensive. - Do not use
mmap
in Go programs. This solution has no drawbacks :)
There are many Go programs with mmap
inside and nobody complains.
These programs may work without issues while the accessed data from mmap
ed file fits page cache
. Page cache size is limited by RAM size. So these programs should experience stalls when mmap
ed files exceed RAM size. Program stalls may be left unnoticed on low loads and on faster storages (SSDs instead of HDDs or network storages).
The program may experience stalls if mmap
ed file is smaller than the RAM size in the following cases:
- On the first access to
mmap
ed data if it isn't present in the page cache yet. Such stalls usually occur during programwarmup
. Note that stalls induced bymajor page fault
increase latencies for ALL the requests, including requests, which don't touchmmap
ed data. - On the first access to
mmap
ed data after its' eviction from the page cache. The eviction may be caused by a third-party app running on the same OS. For instance, innocentgrep
over a big log file quickly evicts useful data from page cache.
Conclusion
Avoid mmap
inside Go programs, since it may cause stalls.
Read how we created the best remote storage for Prometheus. It is written in Go and it doesn’t use mmap
:)
Update: VictoriaMetrics — our high-performance time series database — is open source now, so you can inspect the code and verify that it doesn’t use mmap
:)