Is there analog of memset in go?
According to this bug titled "optimize memset idiom" there is no way to do this in Go other than with a loop. The issue was closed on 9 Jan 2013 with this post
I consider this fixed. Optimizing non-zero cases isn't very interesting.
We can open another bug if people feel strongly about doing more.
So the solution is to use a loop as already covered by icza.
There is bytes.Repeat but that also just uses a loop:
func Repeat(b []byte, count int) []byte {
nb := make([]byte, len(b)*count)
bp := copy(nb, b)
for bp < len(nb) {
copy(nb[bp:], nb[:bp])
bp *= 2
}
return nb
}
The simplest solution with a loop would look like this:
func memsetLoop(a []int, v int) {
for i := range a {
a[i] = v
}
}
There is no memset
support in the standard library, but we can make use of the built-in copy()
which is highly optimized.
With repeated copy()
We can set the first element manually, and start copying the already set part to the unset part using copy()
; where the already set part gets bigger and bigger every time (doubles), so the number of iterations is log(n)
:
func memsetRepeat(a []int, v int) {
if len(a) == 0 {
return
}
a[0] = v
for bp := 1; bp < len(a); bp *= 2 {
copy(a[bp:], a[:bp])
}
}
This solution was inspired by the implementation of bytes.Repeat()
. If you just want to create a new []byte
filled with the same values, you can use the bytes.Repeat()
function. You can't use that for an existing slice or slices other than []byte
, for that you can use the presented memsetRepeat()
.
In case of small slices memsetRepeat()
may be slower than memsetLoop()
(but in case of small slices it doesn't really matter, it will run in an instant).
Due to using the fast copy()
, memsetRepeat()
will be much faster if the number of elements grows.
Benchmarking these 2 solutions:
var a = make([]int, 1000) // Size will vary
func BenchmarkLoop(b *testing.B) {
for i := 0; i < b.N; i++ {
memsetLoop(a, 10)
}
}
func BenchmarkRepeat(b *testing.B) {
for i := 0; i < b.N; i++ {
memsetRepeat(a, 11)
}
}
Benchmark results
100 elements: ~1.15 times faster
BenchmarkLoop 20000000 81.6 ns/op
BenchmarkRepeat 20000000 71.0 ns/op
1,000 elements: ~2.5 times faster
BenchmarkLoop 2000000 706 ns/op
BenchmarkRepeat 5000000 279 ns/op
10,000 elements: ~2 times faster
BenchmarkLoop 200000 7029 ns/op
BenchmarkRepeat 500000 3544 ns/op
100,000 elements: ~1.5 times faster
BenchmarkLoop 20000 70671 ns/op
BenchmarkRepeat 30000 45213 ns/op
The highest performance gain is around 3800-4000 elements where it is ~3.2 times faster.