What is the difference between Workers and Threads in Puma
This is a big area and I am not an expert, however...
Puma can spawn many workers, and each worker can use many threads to process the request.
Unicorn does not have threads as far as I know, it just has the worker model.
If you use threads though, you need to make sure that your code is thread safe. This means Rails, any gem you rely on, and your own code.
For maximum performance, you might also want to look into JRuby or Rubinius which have proper thread support. MRI is restricted by its GIL.
There is a good article on Heroku which explains how Puma uses workers and threads. You should probably read that and ignore me :)
As the other answer states, this Heroku article is pretty good with explanations of certain configuration items.
However if you need to tune your application on Heroku, or anywhere, then it pays to know how things work.
I think you are almost correct when you say "a worker is a thread inside the puma process", I believe a worker is an operating system level process forked from puma which then can use threads internally.
As far as I understand - puma will fork its operating system process however many times you set via workers
configuration to respond to http requests. This gives you parallelism in terms of handling multiple requests but this will usually take up more memory as it will 'copy' your application code for each worker.
Each puma worker will then use multiple threads within its OS process depending on the threads
configuration. These add concurrency by allowing the puma process to respond to multiple requests itself so that if one thread is blocked, ie processing a request, it can handle a new request with another thread. As stated, this requires your entire application to be threadsafe so that, for example any global configuration from one request does not 'leak' into another.
You would tune puma so that the number of workers was adequate for the number of CPUs and memory available and then tune the threads dependant on how much you would want to saturate the host running your application and how your application behaves - more does not always equal faster/more request throughput.