Performance drop in NumPy matrix-vector multiplication
I think that I got finally the correct answer and explanation to why:
- This problem is fixed in the Python version 3.8.0a2 (Current Pre-release Testing Version)
- The problem exists in Python v 3.7.2 (Latest Release) on Windows and macOS.
I wrote a little bit more longer program to test both my Widows and macOS computers. Looks like NumPy in version 3.7 started to run matmul function in all four logical processors on my computers. I don't see this in 3.8.02a:
$ python3.8 numpy_matmul.py $ python3.7 numpy_matmul.py
Python version : 3.8.0a2 Python version : 3.7.2
build:('v3.8.0a2:23f4589b4b', build:('v3.7.2:9a3ffc0492',
Feb 25 2019 10:59:08') 'Dec 24 2018 02:44:43')
compiler: compiler:
Clang 6.0 (clang-600.0.57) Clang 6.0 (clang-600.0.57)
Tested by Python code only : Tested by Python code only :
90 time = 0.1132 cpu = 0.1100 90 time = 0.1535 cpu = 0.1236
91 time = 0.1133 cpu = 0.1130 91 time = 0.1264 cpu = 0.1263
92 time = 0.1079 cpu = 0.1077 92 time = 0.1089 cpu = 0.1087
93 time = 0.1146 cpu = 0.1145 93 time = 0.1226 cpu = 0.1224
94 time = 0.1176 cpu = 0.1174 94 time = 0.1273 cpu = 0.1271
95 time = 0.1216 cpu = 0.1215 95 time = 0.1372 cpu = 0.1371
96 time = 0.1115 cpu = 0.1114 96 time = 0.2854 cpu = 0.8933
97 time = 0.1231 cpu = 0.1229 97 time = 0.2887 cpu = 0.9033
98 time = 0.1174 cpu = 0.1173 98 time = 0.2836 cpu = 0.8963
99 time = 0.1330 cpu = 0.1301 99 time = 0.3100 cpu = 0.9108
100 time = 0.1130 cpu = 0.1128 100 time = 0.3149 cpu = 0.9087
Tested with timeit.repeat : Tested with timeit.repeat :
90 time = 0.1060 cpu = 0.1066 90 time = 0.1238 cpu = 0.3264
91 time = 0.1091 cpu = 0.1097 91 time = 0.1233 cpu = 0.1240
92 time = 0.1021 cpu = 0.1027 92 time = 0.1138 cpu = 0.1128
93 time = 0.1149 cpu = 0.1156 93 time = 0.1324 cpu = 0.1327
94 time = 0.1135 cpu = 0.1139 94 time = 0.1319 cpu = 0.1326
95 time = 0.1170 cpu = 0.1177 95 time = 0.1325 cpu = 0.1331
96 time = 0.1069 cpu = 0.1076 96 time = 0.2879 cpu = 0.8886
97 time = 0.1192 cpu = 0.1198 97 time = 0.2867 cpu = 0.8986
98 time = 0.1151 cpu = 0.1155 98 time = 0.3034 cpu = 0.8854
99 time = 0.1200 cpu = 0.1207 99 time = 0.2867 cpu = 0.8966
100 time = 0.1146 cpu = 0.1153 100 time = 0.2901 cpu = 0.9018
Here is numpy_matmul.py:
import time
import timeit
import numpy as np
import platform
def correct_cpu(cpu_time):
pv1, pv2, _ = platform.python_version_tuple()
pcv = platform.python_compiler()
if pv1 == '3' and '5' <= pv2 <= '8' and pcv =='Clang 6.0 (clang-600.0.57)':
cpu_time /= 2.0
return cpu_time
def test(func, n, name):
print('\nTested %s :' % name)
for i in range(90, 101):
t = time.perf_counter()
c = time.process_time()
tm = func(i, n)
t = time.perf_counter() - t
c = correct_cpu(time.process_time() - c)
st = t if tm <= 0.0 else tm
print('%3d time = %.4f cpu = %.4f' % (i, st, c))
if abs(t-st)/st > 0.02:
print(' time!= %.4f' % t)
def test1(i, n):
a, b = np.random.rand(i, i), np.random.rand(i)
for _ in range(n):
np.matmul(a, b)
return 0.0
def test2(i, n):
s = 'import numpy as np;' + \
'a, b = np.random.rand({0},{0}), np.random.rand({0})'
s = s.format(i)
r = 'np.matmul(a, b)'
t = timeit.repeat(stmt=r, setup=s, number=n)
return sum(t)
def test3(i, n):
s = 'import numpy as np;' + \
'a, b = np.random.rand({0},{0}), np.random.rand({0})'
s = s.format(i)
r = 'np.matmul(a, b)'
return timeit.timeit(stmt=r, setup=s, number=n)
print('Python version :', platform.python_version())
print(' build :', platform.python_build())
print(' compiler :', platform.python_compiler())
num = 10000
test(test1, 5 * num, 'by Python code only')
test(test2, num, 'with timeit.repeat')
test(test3, 5 * num, 'with timeit.timeit')