How can we secure communication of an unchangeable app (Zoom)?
You cannot magically secure applications like Zoom without changing the application and the infrastructure it relies on.
The missing end-to-end encryption you want to have fixed is due to the basic architecture of Zoom, in which media streams are processed and mixed together on a central server (which is owned by Zoom). Only this architecture actually allows it to perform well without stressing bandwidth and CPU of endpoints when many users are involved. With E2E instead the requirements for CPU and bandwidth at each end would grow linearly with the number of users and thus would quickly overwhelm clients.
These kind of restrictions apply to any video conferencing solution. This means that you will not get real E2E with any other solution too, at least not if you want conferences which scale to many users without having excessive requirements regarding bandwidth and CPU power. The best you can get is that you control the central mixing and forwarding server yourself and thus don't need to trust a third party.
Even the broken AES ECB mode could not be fixed without changing application and infrastructure since the server actually expects the encryption to be a specific way and if you change it the communication will fail.
Usage of a VPN would not magically solve the problem. The data would still need to be processed on the servers owned by Zoom.
TL;DR: Specifically for Zoom, take a look at Zoom Meeting Connector
First off, to get it out of the way, encrypted is not the same as secure, and secure can be vague depending context.
As schroeder♦ have commented, you need to be clear on what you are actually trying to achieve, what threats you are defending against. Only then you may determine if a solution really solve your problem. It might turn out E2E encryption isn't what you actually need, or want. And like Steffen Ullrich said, you can't just magically add that without significant changes to both its application and infrastructure.
Fortunately, in the case of Zoom, there is a relatively easy way out (depending on you actual needs). Zoom allows you to run your own server for streaming audio and video, while still using Zoom server for other management tasks.
(From https://support.zoom.us/hc/en-us/articles/201363113-Meeting-Connector-Core-Concepts)
Zoom offers a public or hybrid cloud service. In the hybrid cloud service, you deploy meeting communication servers known as the Zoom Meeting Connector within your company's internal network. In doing so, user and meeting metadata are managed in the public cloud while the meetings are hosted in your private cloud. All meeting traffic including video, voice and data sharing goes through the on-premise Zoom Meeting Connector.
This way, the conference data stays in a server you control. Even if you makes call from outside internal network, the traffic is still encrypted in transit and only decrypted on (your) meeting server. Deployment should not be difficult for a corporate IT team, though might be challenging for laymen. If you want the privacy of E2EE, this is about as close as you can get without actually changing the software or rolling you own service.
While it is not possible to make E2E encrypted connections with Zoom, your company can probably use an open-source and self-hosted solution like Jitsi. It encrypts the connections between the participants and the server, and only the participants and the server will have the data unencrypted.
So, because you can host the server wherever you want, if you control both the clients and the server you control your data.
Of course, if you use an instance that you don't control, you have to make trust in the instance owner. Similarly, if you use Zoom Meeting Connector , you still have to trust Zoom to not leak - voluntarily or not - your audio. (not saying they will - that's just a possibility)
(From a usability point of view, Jitsi is quite similar to Zoom, although not as feature full, but it works really well)