Course 1 Lessons
Value-focused Definition of Dark Data:
Data in any form that has or has not been attained and are needed but are not creating or providing, value or impact. Resultant from poor positioning, inability, illiteracy, obliviousness, or any impacting factor that is unintentional.
Value-focused Definition of Data Products:
A tool or asset that defines its end user utility by creating or providing actionable impact or value, primarily based on the application of its constantly changing and evolving data, transformed into useful information, that does something for an end user or enables an end user to do something better.
Planet Vulcan & The Leaning Tower of Pisa
Dark Data Case Studies
Two real world examples of big failures due to Dark Data are the Leaning Tower of Pisa and the discovery of the Planet Vulcan. In 1859, Urbain Le Verrier, a French Astronomer published a paper correctly suggesting that Mercury’s orbit was non-standard and could not be explained by Newton’s Law of Gravity. He furthered his hypothesis by suggesting that it could be explained if there were another planet between Mercury and the Sun. He suggested this Planet be named Vulcan after the Roman God of fire. His claims were taken seriously since he played a critical role in the discovery of Neptune. Later that same year he received correspondence from Edmond Lescalbalt French physician and amateur astronomer, claiming that he had seen a planet between mercury and the sun. Le Verreir visited Lescarbault and was convinced of his finding. Early in 1860 Le Verrier officially announced the discovery of Vulcan in Paris based on Lescarbault’s observations. Later other scientists continued to send in observation results confirming Vulcans existence, until the early 20th century. In 1915 Albert Einstein published the General Theory of Relativity, which clearly explained Mercury’s orbital variances. Einstein’s work dispelled the notion that there was a planet Vulcan.
In the case of Planet Vulcan, the Dark Data was Einstein’s General Theory of Relativity, it was not known to Le Verrier and Le Verrier was not curious enough to fully test his hypothesis. Instead, he jumped to speculation based on the knowledge that he had at hand. Le Verrier believed he knew something that was actually unknown and was unaware that it was unknown. This scenario can be seen playing out in many projects and initiatives in the modern-day environment. In some cases, people operate with only the knowledge that is available irrespective of the facts aligning with what they believe to be true. This can cause serious disasters and erode the value that is being sought after.
In the second case, that of the Leaning tower of Pisa, the Dark Data revealed itself in the forms of lack of knowledge and improper goal setting. These knowledge gaps served as the catalyst for the tower to become known today as tourist site and not serve its original purpose. In the 12th century AC, at the time of groundbreaking for the construction of a free-standing bell tower in the town of Pisa in Italy, people didn’t have a solid grasp of the concepts within “Geotechnical Engineering”, this was Dark Data at the time. The tower was built on soggy soil a critical feature that was overlooked. Furthermore, the depth of the base was too shallow for it to support a tower of its size. Although, soil seems fairly straight forward to most people, soil is one of the most complex environments that Engineers have to deal with given the unique mix that varies from location to location. Furthermore, in buildings such as towers, obelisks and skyscrapers wind dynamics can also affect its stability which have impacts to the foundation based in the soil. If enough force is applied at the top, the ground below can shift and reform, negatively impacting it’s bearing capacity, the measure in which the ground can handle force applied to it. The soil’s shear strength wasn’t assessed for viscosity, cohesion, mineral composition, or gain size. So, there was no way to understand the depth or width needed for the foundation.
The Leaning Tower of Pisa was built over the course of 200 plus years. In multiple phases. It began to sink after construction on the second floor began. Causing construction to halt for nearly 100 years, the people were hoping that the ground would settle underneath the mere three foot deep base. In 1272 construction resumed, until 1284 when Pisa was conquered by the Genoese. During that period they began built the upper floor with a slant to the higher side to compensate for the lean. To this day the tower is curved because of this. In 1319, the seventh floor was completed, followed by the finalization of the bell tower in 1372.
Since the time of its completion, it has remand leaning and had continued to worsen. In the 1990’s the tower was closed from tourism from fear that it may fall. Several efforts were made to reduce the tilt. However, there was no success until they placed weights at the base of the high side and began slowly removing ground underneath the high side. This caused the tower to begin slowly sinking and it reduced the overall tilt. Now the tower is said to be safe for another 200 years at minimum.
To make sense of the unknowns, data analysts often use a framework known as the “Johari Window.” The Johari Window consists of four quadrants, each representing a different type of unknown. The first quadrant is the “Known Knowns,” or the information that we are aware of and can easily access. This information is the easiest to analyze and interpret, and it forms the basis for many data-driven decisions.
The second quadrant is the “Known Unknowns,” or the information that we are aware of but do not have access to. This could include data that is not available to us, or information that is hidden behind a paywall, firewall or other restriction. Data analysts often try to find ways to access this information, as it can provide valuable insights that are not available elsewhere.
The third quadrant is the “Unknown Knowns,” or the information that we possess but are not aware of. This could include biases or assumptions that we hold unconsciously, or data that we have collected but have not yet analyzed. It can be difficult to uncover these unknown knowns, but doing so can lead to significant breakthroughs in understanding.
Finally, the fourth quadrant is the “Unknown Unknowns,” or the information that we don’t know we don’t know. This represents the true mysteries of the data world, the surprises that can emerge unexpectedly and upend our assumptions. Analysts must be prepared to face these unknown unknowns and adapt their models and methods as new information emerges.
The Johari Window provides a useful way of thinking about the unknowns in data analysis. By understanding the different types of unknowns and developing strategies for uncovering them, analysts can gain a more complete picture of the data and make more informed decisions. With data it is exceptionally useful since dark data is a relativistic categorization and the Johari window is centered on knowledge being relational to self.
The Johari window’s four quadrants are widely accepted as a means of gaining clarity in areas that can be difficult to comprehend. However, there are areas of the unknown that require a more stringent approach to thinking. This involves a mix of data, opinions, information, intelligence, and facts that are logically pieced together and believed to be true but may not necessarily be so. Fundamentally to believe suggests that there is the possibility of doubt or error in the belief, otherwise it would be fact. This part of our knowledge suggests that; we don’t know, that what we believe we know, we actually do not. This can be slightly offsetting to work through mentally. Furthermore, it is a slightly different view of the unknown unknown, as opposed to viewing it as, we don’t know what we don’t know. Finding ourselves not knowing that we do not know what we believed we knew is a complete change in framing of a situation.
This distinction is crucial, as our decision-making and problem-solving approaches differ significantly between a problem that is caused by something that is truly unknown, versus one that is caused by unknown assumptions or beliefs (where we believe we know but are unaware that we do not). This discrepancy greatly impacts our outcomes and our value proposition.
An unknown assumption is a conclusion or belief that a person holds to be true, but they are unaware that their conclusion is an assumption that may not be factual. It refers to an assumption that is typically based on incomplete or insufficient information, leading to uncertainty and doubt about its accuracy. This means that the person has not consciously acknowledged or recognized their conclusion as only an assumption and not a fact. Therefore, they may not have critically examined its validity or accuracy. Unknown assumptions can influence a person’s thoughts, behaviors, and decisions without them realizing it, and can potentially lead to preconceptions or incorrect conclusions. Identifying and examining unknown assumptions can help individuals gain a better understanding of how they choose to solve problems, set goals and improve their decision-making processes. The data associated with unknown assumptions is the darkest and most significantly impactful. Therefore, it is imperative to explore strategies and tactics for handling dark data to elevate our ability to obtain success.