Understanding sshd logs
There is no good port to use, only good SSH configurations. If you disable password-based logins and only allow key-based authentication, you won’t risk much from such brute-forcing attempts. You could add port-knocking, but that’s security by obscurity.
The port numbers listed on the right of the logs are the source ports; these are dynamically allocated and are on the source system, not the target system.
[preauth]
means that the logged event happened before the connection was authenticated — i.e. in this case that the connection is closed before being authenticated.All the logs from your second set of logs correspond to non-SSH traffic sent to your dæmon. You’ll see this happen quite a lot, especially since you’re listening on a non-standard port — various scanners will send requests without knowing what is listening on the other end.
Scanning large portions of the Internet, on a variety of ports, doesn’t take very long if you have well-connected systems to scan from, or a large number of compromised hosts in a botnet. See massscan
for an example of a mass-scanning tool. There are also lists of known-open IP addresses and ports which are circulated; so all it takes is for one scan to find your open port 9000.
Short of a comprehensive guide to sshd logs, but addressing your points:
Did I just choose a port that's too easy to guess? What would be a better port number?
There's "only" 65,535 ports, and scanners are good at finding them, so once you've moved beyond port 22 to avoid the simplest scans, there's not a whole lot of benefit to picking one arbitrary port over another.
What do the port numbers in these sshd logs even mean? How can they have access to port (43944) if my router is only configured to forward port 9000 to port 22? I
The port numbers after the IP's, such as 209.17.97.34 port 43944
indicate the source-side's port that was likely arbitrarily chosen by the kernel on that side. It means next to nothing to you.
What does [preauth] mean?
It's short for "pre (before) authentication"; ssh performs in stages, and this is one. There are other, similar questions here at U&L.
What does Bad protocol version identification 'REMOTE HI_SRDK_DEV_GetHddInfo MCTP/1.0' from 162.207.145.58 port 48248 mean?
A quick Google search turned up this strange query in server logs REMOTE HI_SRDK_DEV_GetHddInfo in Stack Overflow -- reinforcing the idea that this is a scanner looking for "opportunities".
Well your bad protocol stuff is scanning for Kguard Digital Video Recorders vulnerable to CVE-2015-4464. They just happen to default to port 9000..
https://dl.packetstormsecurity.net/1506-exploits/kdvr-authorization.txt