edit: first published Feb 12, 2024, after fresh thoughts on tension (see Tension: the Stuff of Life) rebuilt my entire definition of Information.
Information, as defined by Claude Shannon, is a relative measure comparing a message to the universe of possible messages. If you’re playing hide-and-seek in a room with two cupboards, finding the person is very different than playing the same game in a whole house, or a country.
The bounds of possibility determine how much information is in a given piece of text/bits/pixels. People often talk of information in terms of messages, where a message is composed of letters chosen from an alphabet. One can also talk in terms of words, which are chosen from a dictionary of all possible words. If you say to a child, “that fire is hot,” but the child doesn’t know words like “dangerous” or “warm” or “burning,” the child will not know whether or not it is safe to approach the fire.
The larger the possible dictionary, the more information is in a given word choice, because we learn from the omission of things just as we do from the presence of other things. Of course, the size of the dictionary is in the eye (or mind) of the beholder. It is the child who must know more words in order for them to receive more information from your words.
The dictionary is characterized by tension. For something to be in the dictionary, it must be possible. If there is no potential for something, then it associated with no possible information. (see Tension: the Stuff of Life).
Because we exist at a specific point in time, the future is to us a universe of possibility: a mega-dictionary, if you will. And over time, those possibilities converge into one specific future state. So the unfolding of time is constantly providing us with information.
New information, then, is a shrinking of the possible into the specific. In that sense, it is the act of making choices, reducing uncertainty, and turning a myriad of potential into the actual.
I use the word uncertainty sparingly, because I find it a tricky one to wrestle with. Who is uncertain, and about what? What is the difference between certainty and confidence? And many more. But what’s crucial is that there is a potential, a tension, in the set of possible outcomes, that makes it uncertain. And over time, as that set grows smaller, the tension is released, the uncertainty is reduced, and information is revealed.
Sometimes we want to know what the future state will be, before that information is revealed (the answer). We then take what we do know, which includes the set of possible outcomes (the question). To bridge the gap (or tension) between question and answer, we create a model that unfolds on its own to reveal its own information, a universe of our own creation. A model is a process (what is assumed about how things work, which includes the set of possible future options), that is then fed some data, and outputs something that is more specific than the initial set of possible outcomes. For example, it might give a probability distribution of the future outcomes, with confidence intervals to indicate where it was unable to specify beyond a certain range. That result is our prediction of the answer.
The level of specificity is the model’s confidence. A model that can specify from potential all the way to fully specific is the most confident, while a less confident model will only specify to a range (the confidence interval).
A model is in fact a limited unfolding of a universe of our own creation. The usefulness of a model lies in how closely it mimics the behavior of the phenomenon its creator is trying to predict. In that sense, a model is a mechanism which creates its own information, and we judge that information by how relevant it is to the information we are seeking in the real world.
There are many ways for a model to be a bad model.
A false assumption, treating something wrong as right, is the greatest danger to a model. This is why more data does not always make a model better. For example, the set of possible outcomes, is an assumption. The assumption of a smaller dictionary might lead someone to false confidence in the information of a few words. Famously, if you think there are only white swans in the world, seeing yet another white swan will in fact make your swan-predictive model worse (less predictive of the real world), if you don’t know about the existence of black swans. This is roughly why big data can be misleading - it holds the potential to accentuate confidence in models that are wrong. Hubris is the denial of the possibility of false assumptions.
Historically, our intuition was our main predictive model. The success of those predictions determined the survivability of the actor and their genes, because good prediction meant people were able to better prepare for change. Intuition and other predictive models allow us to act when some important real-world information is not yet revealed. Human intuition has used other mechanisms, such as myth and vigilance, to avoid hubris. Statistical models can be more rigorous and offer better predictive in certain realms, but their inbuilt hubris means that their misapplication in the wrong realms is very dangerous.
We separate then between rigorous models and good models. A rigorous model is one that incorporated all relevant data that was known at the time of the prediction, and modeled the mechanisms between that data correctly. A good model, however, is one that has better long-term predictive ability by decreasing hubris. A good model is one that actually understands how good its own predictive ability is.
A single outcome cannot prove if a predictive model is good, but it can prove a model is bad if the actual outcome was not even within the model’s universe of possibilities. Outcomes can be used to prove, disprove, or improve a model over time.
When it comes to the unknown, some things are random - they cannot be predicted with any measure of confidence, as no model can provide enough specificity to match the real-world outcomes. However, in some cases, better measurement and understanding can add specificity to a model, thus decreasing the randomness of the real-world phenomena. Perfectly measuring the weight distribution of a coin may be key to a perfect model that predicts the coin flips. It is possible, however, that some things are truly random, and no amount of measurement will ever allow prediction.
The more randomness in a phenomenon, the harder it is to model usefully. However, even phenomenon with an element of randomness can be bounded. A coin flip may have a random outcome, but we know it will always yield heads or tails. Thus the model for a coin may still be somewhat useful (we are very confident about the set of possible outcomes, just not the specific outcome of the next flip - although we are confident about the outcomes over a large enough number of flips).
Chaos is a subset of temporary randomness. Chaos (as seen in the double pendulum example) occurs when a tiny, tiny variance in initial conditions leads to huge disparities in outcome. In theory, two double pendulums with precisely the same starting conditions, will follow the same path. The problem is that it is almost impossible for us to measure to that degree of precision. While our measurements catch up, we treat chaos as permanent randomness.
When someone takes an action based on a prediction, they are making a bet by attaching something they value to that prediction. A bet carries a set of possible payoffs: the positive ones are upside, and the negative are downside, for different potential outcomes.
Risk is the map of upside and downside. If no future outcome results in a negative consequence or loss to the player of something of value, then there is no risk. Risk is intrinsic to the bet and the possible outcomes, not to the model. However, since the real set of possible outcomes is not known (the information is not yet revealed), all risk is perceived risk, relative to one’s assumptions about the set of possible outcomes and their model’s prediction of those outcomes.
A model might be incredibly predictive at one level of specificity, but not another. In that sense, the model’s confidence allows for risk to be aggregated and disaggregated. This is the foundation of insurance: at an individual level, it is very difficult to model the probability of a car accident, and the downside can be quite steep. However, it is much easier to model the probabilities of car accidents over larger populations. The individual, then, is willing to trade money to decrease their downside, and the insurer is wiling to take that money in exchange for a financial guarantee. In this way, two actors can create a market by making two separate bets, even though neither disagrees with the other’s model (scale-divide markets).
A market can also be formed by two parties with diverging predictive models (differing-model markets). The models require subjective assumptions and also process data differently. Two traders might have competing predictions on the future of a stock, so they will each take opposite sides of a bet: one will sell, predicting the price to move down, while the other will buy, predicting its rise.
A scale-divide market is a positive-sum game, where all actors benefit if the information matches the models. A differing-model market is a zero-sum game, since the revealed information will more closely match one model than another, meaning one actor will get upside while the other gets downside.
The stock market is a collection of many, many markets of various kinds. The models of players here might be very complex, making assumptions about such things such as the models of other players, processing different data, or processing the same data differently. Without these differing models, much (though not all) of the financial markets would not exist. Every one of these models considers real-world information as soon as it is revealed. In this sense, new data is always priced in, but there is no guarantee that the models are good ones, which is to say that the new data is priced in correctly.
Efficient market theory claims that these models may all be wrong, but that their aggregate, the weighted average of all of the models - reflected in price - is the best predictive model based on everything that is known. George Soros claims that the market’s blind faith in efficient markets itself is a wrong assumption in so many predictive models, that in many cases, the markets are deeply inefficient (but more on this later).
A bet can be wise, foolish, conservative, aggressive. This is dependent on the size of the bet, and how valuable that might be to the player (especially as relative to their ‘hand’ or ‘portfolio’ or ‘net worth’), as well as the model they are using to choose that bet.
For example, putting all one’s eggs in one basket (metaphorically or literally) when one is not very confident about something, would be considered foolish. However, it would be especially foolish if one is operating based on a model with a very poor prediction track records.
However, it’s important to note that the value judgement falls not on the game, but the bet (and by extension, the player making the bet). Gambling is generally frowned upon by major religions, but poker is not inherently gambling: one person might make a series of wise bets at a poker table, while sitting beside someone making very, very foolish ones.
A bet can also reflect the risk appetite of the player. There is a band of wisdom that encompasses a range of risk appetite. Players can fall anywhere in this band and still be making wise bets. However, risk appetite might fall too low, out of fear, or too high, out of greed. All of which to say, there is a range of wise risk appetite, but it is fairly narrow compared to the full range of bets one could be making.