Why 16-bit address with 12-bit offset results in 4KB page size?
This is assuming byte-addressed memory (which almost every machine made in the past 30 years uses), so each address refers to a byte, not an entry or address or any other larger value. To hold a 16-bit value, you'll need two consecutive addresses (two bytes).
More than 30 years ago, there used to be machines which were word addressed, which worked like you surmise. But such machines had a tough time dealing with byte-oriented data (such as ASCII characters), and so have fallen out of favor. Nowadays, things like byte addressability, 8-bit bytes and twos-complement integers are pretty much just assumed.
The 12 bits are an offset within a page. The offset is in bytes, not addresses. 2^12 is 4096.
Because with 12 bit, we can address 2^12=4096
slots. Each slot represents an address which size is 1 byte in byte-addressable memory. Hence the total size is 4096*1=4096 bytes = 4KB.