Information, as defined by Claude Shannon, is a relative measure of what is known vs. what is unknown. The bounds of the unknown (another piece of information) determine how much information is in what is known: we get more information out of a specific word when there are 10 other words that describe something similar but not exactly the same (we can learn from the omission of things just as much as from their presence). New information, then, is a reduction in uncertainty.
Sometimes, we need to take some action based on some value (the answer), but do not know what the value is. We can take what we do know (the question) and turn it into a model that makes a prediction about what the answer is. Confidence is a property of such a model, and thus relative to the actor. The confidence interval represents the model’s take of what is needed to find the answer, and how much is already known.
For example, when trying to predict some quantities, a model will output a probability distribution (a prediction that includes a confidence interval), based on everything that is known. “Everything that is known” is both data and mechanism (how the data affects the outcome).
Predictive models are built on many assumptions, each of which can be right or wrong. Generally, these assumptions are things which are known and which the model treats with very high confidence.
A false assumption, treating something wrong as right, is the greatest danger to a model. This is why knowing more (aka more data) does not always make a model better. Remember, not all data increases information - it only does so within the understanding of how much is also unknown. For example, based on an assumption of a smaller alphabet, one might have false confidence in the information of a few letters. Seeing another white swan will in fact make your swan-predictive model worse, if you don’t know about the existence of black swans. This is roughly why big data can be misleading - it holds the potential to accentuate confidence in models that are wrong. Hubris is the denial of the possibility of false assumptions.
Historically, our intuition was the main predictive model. The success of those predictions determined the survivability of the actor and their genes. Intuition and other predictive models allow us to act when something is still unknown. Human intuition has used other mechanisms, such as myth and vigilance, to avoid hubris. Statistical models can be more rigorous and offer better predictive in certain realms, but their inbuilt hubris means that their misapplication in the wrong realms is very dangerous.
We separate then between rigorous models and good models. A rigorous model is one that incorporated all relevant data that was known at the time of the prediction, and modeled the mechanisms between that data correctly. A good model, however, is one that has better long-term predictive ability by decreasing hubris. A good model is one that actually understands how much information it has.
A single outcome cannot prove if a predictive model is good, but it can prove a model is bad. Outcomes can be used to prove, disprove, or improve a model over time.
When it comes to the unknown, some things are random - they cannot be predicted with any measure of confidence. However, prediction and confidence are relative to the actor. Therefore, in some cases, better measurement and understanding can decrease randomness. Perfectly measuring the weight distribution of a coin may yield perfect predictions. It is possible, however, that some things are truly random, and no amount of measurement will ever allow prediction.
The more randomness in a system, the harder it is to make accurate predictions. However, even systems with an element of randomness can be bounded. A coin flip may have a random outcome, but we know it will always yield heads or tails.
Chaos is a subset of temporary randomness. Chaos (as seen in the double pendulum example) occurs when a tiny, tiny variance in initial conditions leads to huge disparities in outcome. In theory, two double pendulums with precisely the same starting conditions, will follow the same path. The problem is that it is almost impossible for us to measure to that degree of precision. While our measurements catch up, we treat chaos as permanent randomness.
Risk is upside and downside is the potential outcome associated with the future (or any uncertainty), or the attachment of something of value to a prediction. In a fully certain world, where all is known, there is no risk. Uncertainty can be aggregated and disaggregated, so risk in markets can be as well. This is the foundation of insurance - the downside of an individual car accident is high, but the probability of it happening to a single person is difficult to compute. The individual, then, is willing to trade money to decrease that downside. The number of accidents over a million drivers, however, is easier to compute, so an insurer is willing to take that money in exchange for a financial guarantee. In this way, uncertainty is decreased as data is aggregated, creating a market.
The market is a reflection of many, many different predictive models. These models inherently require subjective assumptions (because they are, in part, based on predictions of what others are thinking and planning). These models process data differently, and incorporate uncertainty differently into their predictions (for example, the majority of models may treat all randomness in the same way, without correctly considering the bounds of that randomness, and thus over-punish randomness). Without differing models, there would be no markets. New data is then always priced in, but there is no guarantee that it is always priced in correctly.
Efficient market theory claims that these models may all be wrong, but that their aggregate, the weighted average of all of the models - reflected in price - is the best predictive model based on everything that is known. George Soros claims that the market’s blind faith in efficient markets itself is a wrong assumption in so many predictive models, that in many cases, the markets are deeply inefficient (but more on this later).