This doesn't really make entropy itself observer dependent. (Shannon) entropy is a property of a distribution. It's just that when you're measuring different observers' beliefs, you're looking at different distributions (which can have different entropies the same way they can have different means, variances, etc).
Entropy is a property of a distribution, but since math does sometimes get applied, we also attach distributions to things (eg. the entropy of a random number generator, the entropy of a gas...). Then when we talk about the entropy of those things, those entropies are indeed subjective, because different subjects will attach different probability distributions to that system depending on their information about that system.
Some probability distributions are objective. The probability that my random number generator gives me a certain number is given by a certain formula. Describing it with another distribution would be wrong.
Another example, if you have an electron in a superposition of half spin-up and half spin-down, then the probability to measure up is objectively 50%.
Another example, GPT-2 is a probability distribution on sequences of integers. You can download this probability distribution. It doesn't represent anyone's beliefs. The distribution has a certain entropy. That entropy is an objective property of the distribution.
Of those, the quantum superposition is the only one that has a chance at being considered objective, and it's still only "objective" in the sense that (as far as we know) your description provided as much information as anyone can possibly have about it, so nobody can have a more-informed opinion and all subjects agree.
The others are both partial-information problems which are very sensitive to knowing certain hidden-state information. Your random number generator gives you a number that you didn't expect, and for which a formula describes your best guess based on available incomplete information, but the computer program that generated knew which one to choose and it would not have picked any other. Anyone who knew the hidden state of the RNG would also have assigned a different probability to that number being chosen.
You might have some probability distribution in your head for what will come out of GPT-2 on your machine at a certain time, based on your knowledge of the random seed. But that is not the GPT-2 probability distribution, which is objectively defined by model weights that you can download, and which does not correspond to anyone’s beliefs.
I'm of the view that strictly speaking, even a fair die doesn't have a probability distribution until you throw it. It just so happens that, unless you know almost every detail about the throw, the best you can usually do is uniform.
So I would say the same of GPT-2. It's not a random variable unless you query it. But unless you know unreasonably many details, the best you can do to predict the query is the distribution that you would call "objective."
I think this gets into unanswerable metaphysical questions about when we can say mathematical objects, propositions, etc. really exist.
But I think if we take the view that it's not a random variable until we query it, that makes it awkward to talk about how GPT-2 (and similar models) is trained. No one ever draws samples from the model during training, but the whole justification for the cross-entropy-minimizing training procedure is based on thinking about the model as a random variable.
A more plausible way to argue for objectiveness is to say that some probability distributions are objectively more rational than others given the same information. E.g. when seeing a symmetrical die it would be irrational to give 5 a higher probability than the others. Or it seems irrational to believe that the sun will explode tomorrow.
The probability distribution is subjective for both parts -- because it, once again, depends on the observer observing the events in order to build a probability distribution.
E.g. your random number generator generates 1, 5, 7, 8, 3 when you run it. It generates 4, 8, 8, 2, 5 when I run it. I.e. we have received different information about the random number generator to build our subjective probability distributions. The level of entropy of our probability distributions is high because we have so little information to be certain about the representativeness of our distribution sample.
If we continue running our random number generator for a while, we will gather more information, thus reducing entropy, and our probability distributions will both start converging towards an objective "truth." If we ran our random number generators for a theoretically infinite amount of time, we will have reduced entropy to 0 and have a perfect and objective probability distribution.
Would you say that all claims about the world are subjective, because they have to be based on someone’s observations?
For example my cat weighs 13 pounds. That seems objective, in the sense that if two people disagree, only one can be right. But the claim is based on my observations. I think your logic leads us to deny that anything is objective.
I do believe in objective reality, but probabilities are subjective. Your cat weighs 13 pounds, and now that you've told me, I know it too. If you asked me to draw a probability distribution for the weight of your cat, I'd draw a tight gaussian distribution around that, representing the accuracy of your scale. My cat weighs a different amount, but I won't tell you how much, so if we both draw a probability distribution, they'll be different. And the key thing is that neither of us has an objectively correct probability distribution, not even me. My cat's weight has an objectively correct value which even I don't know, because my scale isn't good enough.
All right now, here's the big question:
how do you know that the evidence your
sensory apparatus reveals to you is correct?
What I'm getting at is this: the only
experience that is directly available to
you is your sensory data. And this sensory
data is merely a stream of electrical
impulses which stimulate your computing
center.
In other words, all that I really know
about the outside universe is relayed to
me through my electrical connections.
Why, that would mean that...I really don't
know what the outside universe is like at
all, for certain.
Sorry, this is a major misinterpretation, or at least a completely different one. I don't know how to put it in a more productive way; I think your comment is very confused. You don't need to run a random number generator "for a while" in order to build up a probability distribution.
This might be a frequentist vs bayesian thing, and I am bayesian. So maybe other people would have a different view.
I don't think you need to have any information to have a probability distribution; your distribution already represents your degree of ignorance about an outcome. So without even sampling it once, you already should have a uniform probability distribution for a random number generator or a coin flip. If you do personally have additional information to help you predict the outcome -- you're skilled at coin-flipping, or you wrote the RNG and know an exploit -- then you can compress that distribution to a lower-entropy one.
But you don't need to sample the distribution to do this. You can have that information before the first coin toss. Sampling can be one way to get information but it won't necessarily even help. If samples are independent, then each sample really teaches you barely anything about the next. RNGs eventually do repeat so if you sample it enough you might be able to find the pattern and reduce the entropy to zero, but in that case you're not learning the statistical distribution, you're deducing the exact internal state of the RNG and predicting the exact next outcome, because the samples are not actually independent. If you do enough coin flips you might eventually find that there's a slight bias to the coin, but that really takes an extreme number of tosses and only reduces the entropy a tiny tiny bit; not at all if the coin-tossing procedure had no bias to begin with.
However the objective truth is just that the next toss will land heads. That's the only truth that experiment can objectively determine. Any other doubt that it might-have-counterfactually-landed-tails is subjective, due to a subjective lack of sufficient information to predict the outcome. We can formalize a correct procedure to convert prior information into a corresponding probability distribution, we can get a unanimous consensus by giving everybody the same information, but the probability distribution is still subjective because it is a function of that prior information.
The best introduction that I can recommend is this type-written PDF from E.T. Jaynes, called "probability theory with applications in science and engineering": https://bayes.wustl.edu/etj/science.pdf.html
It requires a lot of attention to read and follow the math, but it's worthwhile. Jaynes is a pretty passionate writer, and in his writing he's clearly battling against some enemies (who might be ghosts), but on the other hand this also makes for more entertaining reading and I find that's usually a benefit when it comes to a textbook.
"Entropy is a property of matter that measures the degree of randomization or disorder at the microscopic level", at least when considering the second law.
Right, but the very interesting thing is it turns out that what's random to me might not be random to you! And the reason that "microscopic" is included is because that's a shorthand for "information you probably don't have about a system, because your eyes aren't that good, or even if they are, your brain ignored the fine details anyway."
Entropy in physics is usually the Shannon entropy of the probability distribution over system microstates given known temperature and pressure. If the system is in equilibrium then this is objective.
That's not a problem, as the GP's post is trying to state a mathematical relation not a historical attribution. Often newer concepts shed light on older ones. As Baez's article says, Gibbs entropy is Shannon's entropy of an associated distribution(multiplied by the constant k).
It is a problem because all three come with a bagage. Almost none of the things discussed in this thread are invalid when discussing actual physical entropy even though the equations are superficially similar. And then there are lots of people being confidently wrong because they assume that it’s just one concept. It really is not.
Don't see how the connection is superficial. Even the classical macroscopic definition of entropy as ΔS=∫TdQ can be derived from the information theory perspective as Baez shows in article(using entropy maximizing distributions and Lagrange multipliers). If you have a more specific critique, it would be good to discuss.
In classical physics there is no real objective randomness. Particles have a defined position and momentum and those evolve deterministically. If you somehow learned these then the shannon entropy is zero. If entropy is zero then all kinds of things break down.
So now you are forced to consider e.g. temperature an impossibility without quantum-derived randomness, even though temperature does not really seem to be a quantum thing.
> If entropy is zero then all kinds of things break down.
Entropy is a macroscopic variable and if you allow microscopic information, strange things can happen! One can move from a high entropy macrostate to a low entropy macrostate if you choose the initial microstate carefully. But this is not a reliable process which you can reproduce experimentally, ie. it is not a thermodynamic process.
A thermodynamics process P is something which takes a macrostate A to a macrostate B, independent of which microstate a0, a1, a2.. in A you started off with it. If the process depends on microstate, then it wouldn't be something we would recognize as we are looking from the macro perspective.
Which we don’t know precisely. Entropy is about not knowing.
> If you somehow learned these then the shannon entropy is zero.
Minus infinity. Entropy in classical statistical mechanics is proportional to the logarithm of the volume in phase space. (You need an appropriate extension of Shannon’s entropy to continuous distributions.)
> So now you are forced to consider e.g. temperature an impossibility without quantum-derived randomness
> Which we don’t know precisely. Entropy is about not knowing.
No, it is not about not knowing. This is an instance of the intuition from Shannon’s entropy does not translate to statistical Physics.
It is about the number of possible microstates, which is completely different. In Physics, entropy is a property of a bit of matter, it is not related to the observer or their knowledge. We can measure the enthalpy change of a material sample and work out its entropy without knowing a thing about its structure.
> Minus infinity. Entropy in classical statistical mechanics is proportional to the logarithm of the volume in phase space.
No, 0. In this case, there is a single state with p=1 and and S = - k Σ p ln(p) = 0.
This is the same if you consider the phase space because then it is reduced to a single point (you need a bit of distribution theory to prove it rigorously but it is somewhat intuitive).
The probability p of an microstate is always between 0 and 1, therefore p ln(p) is always negative and S is always positive.
You get the same using Boltzmann’s approach, in which case Ω = 1 and S = k ln(Ω) is also 0.
> (You need an appropriate extension of Shannon’s entropy to continuous distributions.)
>>> Particles have a defined position and momentum [...] If you somehow learned these then the shannon entropy is zero.
>> Entropy in classical statistical mechanics is proportional to the logarithm of the volume in phase space [and diverges to minus infinity if you define precisely the position and momentum of the particles and the volume in phase sphere goes to zero]
> [It's zero also] if you consider the phase space because then it is reduced to a single point (you need a bit of distribution theory to prove it rigorously but it is somewhat intuitive).
> The probability p of an microstate is always between 0 and 1, therefore p ln(p) is always negative and S is always positive.
The points in the phase space are not "microstates" with probability between 0 and 1. It's a continuous distribution and if it collapses to a point (i.e. you somehow learned the exact positions and momentums) the density at that point is unbounded. The entropy is also unbounded and goes to minus infinity as the volume in phase space collapses to zero.
You can avoid the divergence by dividing the continuous phase space into discrete "microstates" but having a well-defined "microstate" corresponding to some finite volume in phase space is not the same as what was written above about "particles having a defined position and momentum" that is "somehow learned". The microstates do not have precisely defined positions and momentums. The phase space is not reduced to a single point in that case.
If the phase space is reduced to a single point I'd like to see your proof that S(ρ) = −k ∫ ρ(x) log ρ(x) dx = 0
I hadn't realized that "differential" entropy and shannon entropy are actually different and incompatible, huh.
So the case I mentioned, where you know all the positions and momentums has 0 shannon entropy and -Inf differential entropy. And a typical distribution will instead have Inf shannon entropy and finite differential entropy.
Wikipedia has some pretty interesting discussion about Differential Entropy vs Limiting density of Points, but I can't claim to understand it and whether it could bridge the gap here.
Quantum mechanics solves the issue of the continuity of the state space. However, as you probably know, in quantum mechanics all the positions and momentums cannot simultaneously have definite values.
> In Physics, entropy is a property of a bit of matter, it is not related to the observer or their knowledge. We can measure the enthalpy change of a material sample and work out its entropy without knowing a thing about its structure.
Enthalpy is also dependent on your choice of state variables, which is in turn dictated by which observables you want to make predictions about: whether two microstates are distinguishable, and thus whether the part of the same macrostate, depends on the tools you have for distinguishing them.
A calorimeter does not care about anyone’s choice of state variables. Entropy is not only something that exists in abstract theoretical constructs, it is something we can get experimentally.
If information-theoretical and statistical mechanics entropies are NOT the same (or at least, deeply connected) then what stops us from having a little guy[0] sort all the particles in a gas to extract more energy from them?
Sounds like a non-sequitur to me; what are you implying about the Maxwell's demon thought experiment vs the comparison between Shannon and stat-mech entropy?
Yeah but distributions are just the accounting tools to keep track of your entropy. If you are missing one bit of information about a system, your understanding of the system is some distribution with one bit of entropy. Like the original comment said, the entropy is the number of bits needed to fill in the unknowns and bring the uncertainty down to zero. Your coin flips may be unknown in advance to you, and thus you model it as a 50/50 distribution, but in a deterministic universe the bits were present all along.