Is information entropy the same as thermodynamic entropy?
So Pratchett's quote seems to be about energy, rather than entropy. I supposed you could claim otherwise if you assume "entropy is knowledge," but I think that's exactly backwards: I think that knowledge is a special case of low entropy. But your question is still interesting.
The entropy $S$ in thermodynamics is related to the number of indistinguishable states that a system can occupy. If all the indistinguishable states are equally probable, the number of "microstates" associated with a system is $\Omega = \exp( S/k )$, where the constant $k\approx\rm25\,meV/300\,K$ is related to the amount of energy exchanged by thermodynamic systems at different temperatures.
The canonical example is a jar of pennies. Suppose I drop 100 coins on the floor. There are 100 ways that I can have one heads-up and the rest tails-up; there are $100\cdot99/2$ ways to have two heads; there are $10 \cdot99\cdot98/6$ ways to have three heads; there are about $10^{28}$ ways to have forty heads, and $10^{29}$ ways to have fifty heads. If you drop a jar of pennies you're not going to find them 3% heads up, any more than you're going to get struck by lightning while you're dealing yourself a royal flush: there are just too many other alternatives.
The connection to thermodynamics comes when not all of my microstates have the same energy, so that my system can exchange energy with its surroundings by having transitions. For instance, suppose my 100 pennies aren't on the floor of my kitchen, but they're in the floorboard of my pickup truck with the out-of-balance tire. The vibration means that each penny has a chance of flipping over, which will tend to drive the distribution towards 50-50. But if there is some other interaction that makes heads-up more likely than tails-up, then 50-50 isn't where I'll stop. Maybe I have an obsessive passenger who flips over all the tails-up pennies. If the shaking and random flipping over is slow enough that he can flip them all, that's effectively "zero temperature"; if the shaking and random flipping is so vigorous that a penny usually flips itself before he corrects the next one, that's "infinite temperature." (This is actually part of the definition of temperature.)
The Boltzmann entropy I used above, $$ S_B = k_B \ln \Omega, $$ is exactly the same as the Shannon entropy, $$ S_S = k_S \ln \Omega, $$ except that Shannon's constant is $k_S = (\ln 2)\rm\,bit$, so that a system with ten bits of information entropy can be in any one of $\Omega=2^{10}$ states.
This is a statement with physical consequences. Suppose that I buy a two-terabyte SD card (apparently the standard supports this) and I fill it up with forty hours of video of my guinea pigs turning hay into poop. By reducing the number of possible states of the SD card from $\Omega=2\times2^{40}\times8$ to one, Boltzmann's definition tells me I have reduced the thermodynamic entropy of the card by $\Delta S = 2.6\rm\,meV/K$. That entropy reduction must be balanced by an equal or larger increase in entropy elsewhere in the universe, and if I do this at room temperature that entropy increase must be accompanied by a heat flow of $\Delta Q = T\Delta S = 0.79\rm\,eV = 10^{-19}\,joule$.
And here we come upon practical, experimental evidence for one difference between information and thermodynamic entropy. Power consumption while writing an SD card is milliwatts or watts, and transferring my forty-hour guinea pig movie will not be a brief operation --- that extra $10^{-19}\rm\,J$, enough energy to drive a single infrared atomic transition, that I have to pay for knowing every single bit on the SD card is nothing compared to the other costs for running the device.
The information entropy is part of, but not nearly all of, the total thermodynamic entropy of a system. The thermodynamic entropy includes state information about every atom of every transistor making up every bit, and in any bi-stable system there will be many, many microscopic configurations that correspond to "on" and many, many distinct microscopic configurations that correspond to "off."
CuriousOne asks,
How comes that the Shannon entropy of the text of a Shakespeare folio doesn't change with temperature?
This is because any effective information storage medium must operate at effectively zero temperature --- otherwise bits flip and information is destroyed. For instance, I have a Complete Works of Shakespeare which is about 1 kg of paper and has an information entropy of about maybe a few megabytes.
This means that when the book was printed there was a minimum extra energy expenditure of $10^{-25}\rm\,J = 1\,\mu eV$ associated with putting those words on the page in that order rather than any others. Knowing what's in the book reduces its entropy. Knowing whether the book is sonnets first or plays first reduces its entropy further. Knowing that "Trip away/Make no stay/Meet me all by break of day" is on page 158 reduces its entropy still further, because if your brain is in the low-entropy state where you know Midsummer Night's Dream you know that it must start on page 140 or 150 or so. And me telling you each of these facts and concomitantly reducing your entropy was associated with an extra energy of some fraction of a nano-eV, totally lost in my brain metabolism, the mechanical energy of my fingers, the operation energy of my computer, the operation energy of my internet connection to the disk at the StackExchange data center where this answer is stored, and so on.
If I raise the temperature of this Complete Works from 300 k to 301 K, I raise its entropy by $\Delta S = \Delta Q/T = 1\,\rm kJ/K$, which corresponds to many yottabytes of information; however the book is cleverly arranged so that the information that is disorganized doesn't affect the arrangements of the words on the pages. If, however, I try to store an extra megajoule of energy in this book, then somewhere along its path to a temperature of 1300 kelvin it will transform into a pile of ashes. Ashes are high-entropy: it's impossible to distinguish ashes of "Love's Labours Lost" from ashes of "Timon of Athens."
The information entropy --- which has been removed from a system where information is stored --- is a tiny subset of the thermodynamic entropy, and you can only reliably store information in parts of a system which are effectively at zero temperature.
A monoatomic ideal gas of, say, argon atoms can also be divided into subsystems where the entropy does or does not depend temperature. Argon atoms have at least three independent ways to store energy: translational motion, electronic excitations, and nuclear excitations.
Suppose you have a mole of argon atoms at room temperature. The translational entropy is given by the Sackur-Tetrode equation, and does depend on the temperature. However the Boltzmann factor for the first excited state at 11 eV is $$ \exp\frac{-11\rm\,eV}{k\cdot300\rm\,K} = 10^{-201} $$ and so the number of argon atoms in the first (or higher) excited states is exactly zero and there is zero entropy in the electronic excitation sector. The electronic excitation entropy remains exactly zero until the Boltzmann factors for all of the excited states add up to $10^{-24}$, so that there is on average one excited atom; that happens somewhere around the temperature $$ T = \frac{-11\rm\,eV}{k}\ln 10^{-24} = 2500\rm\,K. $$ So as you raise the temperature of your mole of argon from 300 K to 500 K the number of excited atoms in your mole changes from exactly zero to exactly zero, which is a zero-entropy configuration, independent of the temperature, in a purely thermodynamic process.
Likewise, even at tens of thousands of kelvin, the entropy stored in the nuclear excitations is zero, because the probability of finding a nucleus in the first excited state around 2 MeV is many orders of magnitude smaller than the number of atoms in your sample.
Likewise, the thermodynamic entropy of the information in my Complete Works of Shakespeare is, if not zero, very low: there are a small number of configurations of text which correspond to a Complete Works of Shakespeare rather than a Lord of the Rings or a Ulysses or a Don Quixote made of the same material with equivalent mass. The information entropy ("Shakespeare's Complete Works fill a few megabytes") tells me the minimum thermodynamic entropy which had to be removed from the system in order to organize it into a Shakespeare's Complete Works, and an associated energy cost with transferring that entropy elsewhere; those costs are tiny compared to the total energy and entropy exchanges involved in printing a book.
As long as the temperature of my book stays substantially below 506 kelvin, the probability of any letter in the book spontaneously changing to look like another letter or like an illegible blob is zero, and changes in temperature are reversible.
This argument suggests, by the way, that if you want to store information in a quantum-mechanical system you need to store it in the ground state, which the system will occupy at zero temperature; therefore you need to find a system which has multiple degenerate ground states. A ferromagnet has a degenerate ground state: the atoms in the magnet want to align with their neighbors, but the direction which they choose to align is unconstrained. Once a ferromagnet has "chosen" an orientation, perhaps with the help of an external aligning field, that direction is stable as long as the temperature is substantially below the Curie temperature --- that is, modest changes in temperature do not cause entropy-increasing fluctuations in the orientation of the magnet. You may be familiar with information-storage mechanisms operating on this principle.
Formally, the two entropies are the same thing. The Gibbs entropy, in thermodynamics, is $$S = -k_B \sum p_i \ln p_i$$ while the Shannon entropy of information theory is $$H = -\sum p_i \log_2 p_i.$$ These are equal up to some numerical factors. Given a statistical ensemble, you can calculate its (thermodynamic) entropy using the Shannon entropy, then multiplying by constants.
However, there is a sense in which you're right. Often when people talk about Shannon entropy, they only use it to count things that we intuitively perceive as information. For example, one might say the entropy of a transistor, flipped to 'on' or 'off' with equal likelihood, is 1 bit.
But the thermodynamic entropy of the transistor is thousands, if not millions of times higher, because it counts everything, i.e. the configurations of all the atoms making up the transistor. (If you want to explain it to your programmer colleagues, say they're not counting whether each individual atom is "on" or "off".)
In general, the amount of "intuitive" information (like bits, or words in a book) is a totally negligible fraction of the total entropy. The thermodynamic entropy of a library is about the same as that of a warehouse of blank books.
To be honest, I believe this question is not really settled, or at least that there is not yet a consensus in the scientific community about what the answer is.
My understanding of the relation is, I think, slightly different than knzhou, rob, or CuriousOne. My understanding is that thermodynamic entropy can be thought of as a particular application of information entropy. In particular, one can apply the principles of information and informational entropy to ask how much one knows about the state of a quantum system, and under certain conditions the thermodynamic Boltzmann entropy seems to be recovered.
As a concrete example, a recent experiment related to this question (1) studies the "entanglement entropy" of an interacting quantum system, which is an application of informational entropy to a quantum state. Under the appropriate circumstances (looking at the single-particle density matrix of a thermalized quantum state), this informational entropy is shown to be identical to the thermodynamic Boltzmann entropy.
From this viewpoint, thermodynamics is "just" a particular application of informational principles. Of course, one can also apply informational principles to entirely different systems such as books and radio communications and so on. As a result, thermodynamic and informational entropies are not the same, but are two particular applications of the same general principle.
However, this opinion is by no means shared by all, and while this correspondence seems to work in some cases like the above experiment, it remains to be explained in a more general setting.
Two somewhat related questions that you might find interesting:
Spontaneous conversion of heat into work at negative temperatures
What are the phenomena responsible for irreversible increase in entropy?
Appendix: Entropy Hierarchy
Here is the hierarchy of entropies I am claiming here (ignoring constants like $k_B$):
Shannon entropy: $S_\textrm{Shannon}=− \sum_i p_i \log p_i$ . Describes, roughly, how much one knows about the state of some system, with $i$ being the possible states. This system could be, for example, a string of binary bits.
Applying this to an unknown quantum state, one gets the Gibbs entropy: $S_\textrm{Gibbs}=− \sum_i p_i \log p_i$, where the $i$ are now specifically the possible quantum states of the system. For this expression to make physical sense, $i$ must be the eigenstates of the system in a basis in which the density matrix is diagonal*. With this stipulation, $S_\textrm{Gibbs}$ is identical to the Von Neumann entropy of a quantum state: $S_\textrm{VN}=− \text{tr}(\rho \log \rho)$, with $\rho$ the density matrix.
The entanglement entropy is simply an application of $S_\textrm{VN}$ to a particular spatial subset of a (usually isolated) system: $S_{EE,A}=− \text{tr}(\rho_A \log \rho_A)$, where $\rho_A$ is the density matrix resulting from the partial trace over the density matrix of a large system, keeping only some local subsystem. In other words, it is the entropy of a particular part of some larger system.
The highly nontrivial claim made in (1) (and elsewhere) is that for a thermalized system, the $S_{EE,A}$ of a small local subsystem $\rho_A$ is equivalent to the Boltzmann thermodynamic entropy, defined as: $S_\textrm{Boltz}=-\sum_i(p_{i,\textrm{th}} \log p_{i,\textrm{th}}) $, with $p_{i,\textrm{th}}=\frac{e^{-E_i/k_B T}}{\sum_i e^{-E_i/k_B T}}$, $i$ as the possible states of $\rho_A$, and $k_B T$ chosen so that the system has the correct average energy. This claim is known, by the way, as the "eigenstate thermalization hypothesis."
*There's nothing too mysterious about this requirement: it is simply because for entropy to have some "nice" properties like additivity the state $i$ must be uncorrelated.