Is it really possible for webRTC to stream high quality audio without noise?
The default audio settings for WebRTC are pretty low. It defaults to mono audio around 42 kb/s as it seems to be designed for voice. I increased the quality by configuring a few settings.
- Disable
autoGainControl
,echoCancellation
andnoiseSuppression
in the getUserMedia() constraints:
navigator.mediaDevices.getUserMedia({
audio: {
autoGainControl: false,
channelCount: 2,
echoCancellation: false,
latency: 0,
noiseSuppression: false,
sampleRate: 48000,
sampleSize: 16,
volume: 1.0
}
});
- Add the
stereo
andmaxaveragebitrate
attributes to the SDP:
let answer = await peer.conn.createAnswer(offerOptions);
answer.sdp = answer.sdp.replace('useinbandfec=1', 'useinbandfec=1; stereo=1; maxaveragebitrate=510000');
await peer.conn.setLocalDescription(answer);
This gives a potential maximum bitrate of 520kbps for stereo, which is 260kbps per channel!
Actual bitrate depends on the speed of your network and strength of your signal.
More information about the SDP:
The Session Description Protocol (SDP) [RFC4566] describes various aspects of multimedia session such as media capabilities, transport addresses and related metadata in a transport agnostic manner, for the purposes of session announcement, session invitation and parameter negotiation.
https://tools.ietf.org/id/draft-nandakumar-rtcweb-sdp-01.html#rfc.section.3
Check out my project which implements these features: https://github.com/kmturley/webrtc-radio
Firstly, its worth saying that Web RTC builds on the underlying network connectivity and if it is poor then there is very little any higher layers can do to avoid this.
Looking at the particular comparison you have highlighted, there are a couple of factors which are key to VoIP voice quality (assuming you are focused on voice from the question):
- Latency: to avoid delay and echo, voice communication needs a low end to end latency. The target for good quality VoIP systems is usually sub 200 ms latency.
- Jitter - this is essentially the variance in the latency one time, i.e. how the end to end delay varies over time.
- Packet loss - voice is actually reasonably tolerant to packet loss compared to data. VoIp targets are typically in the 1% or less range.
Comparing this with steamed radio etc, the key point is the latency - it is not unusual to wait several seconds for a stream to start playing back.
This allows the receiver to fill a much bigger buffer of packets waiting to be decoded and played back, and makes it much more tolerant of variations in the latency (jitter).
Taking a simple example, if you had a brief half second interruption in your connection, this would immediately impact a two way VoIP call, but it might not impact streamed audio at all, assuming the network recovers fully and the buffer had several seconds worth of content in it at the time.
So the quality difference you are seeing compared to streamed audio are most likely related to the real tine nature of the communication, rather than with inherent WebRTC faults - or maybe more precisely, even if WebRTC was perfect, real time two way VoIP is very susceptible to network conditions.
As. a note, video cleary needs much more bandwidth, and is also impacted by the network but people tend to be more tolerant of video 'stutters' than voice quality issues in multimedia calls (at this time amyay).