Data Visualization

From SI410
Revision as of 20:32, 22 January 2022 by Abbysyp (Talk | contribs)

Jump to: navigation, search
Back • ↑Topics • ↑Categories

Data-viz.jpeg

Data visualization (also called information visualization or statistical visualization) is defined as the design, development, and application of computer-generated graphical representation of data[1]. Today, computers can be used to process and display large amounts of data in a way that is efficient, easily accessible, and understandable. The human mind is visual by nature. As a result, visualization is found everywhere: ranging from lines and points on a graph to the standardized symbols called emojis. Whether the underlying information encompasses strict quantitative data or an individual's wish to convey a certain emotion, data visualization is, on a basic level, a method of communicating information and ideas[2].

As the amount of data accumulated due to the rise of the Internet increasingly outsizes what existed before, the need to wrangle, process, and analyze this information has increased as well. Large industries and organizations particularly value the tools used to represent data because they enable decision makers to comprehend information and form an opinion in an efficient, profitable manner[3]. Scientists must choose how they want to represent their data, as well as consider the audience they intend to show it to because data visualization influences how people make sense of the information before them[4]. The processes and decisions which go into creating these representations have caused ethical concerns to arise regarding fairness, bias, and integrity.

History

The first visual representation of statistical data is believed to have been provided by Flemish astronomer Michael Florent van Langren in 1644. In a line graph which records the twelve known estimates at the time of the difference in longitude between Rome and Toledo as well as the name of each astronomer who provided the estimate, van Langren's visualization was notable for its visual portrayal of the wide variations in estimates[5]. In the 18th century, thematic mapping originated in an attempt to catalogue geologic, economic, and medical data, which introduced abstract graphs of functions, measurement error, and collection of empirical data.

Minard's chart from the 1800s, now digitized and interactive

The latter half of the 19th century is what Canadian psychologist Michael Friendly refers to as the "Golden Age of statistical graphics"[6]. This time period hosts famous examples of data visualization, including John Snow's map of cholera outbreaks in the London epidemic of 1854, Charles Minard's 1869 chart showing the number of men in Napoleon's 1812 infamous Russian campaign army, and a new type of visualization called the Rose Diagram from Florence Nightingale. The Golden Age stemmed from the industrial revolution, the establishment of official government statistical offices due to rising population, and a growing recognition for the importance of numerical data in fields like social planning, medicine, military, industrialization, commerce, and transportation.

The 20th century, however, brought about the greatest progression in data visualization because of the development of computing power[7]. The late 1950s and 1960s brought about the adoption of the programming language, FORTRAN, which allowed for the the creation of statistical data processed by computers. Visualizations from history could now be rendered in increasing detail and with interactive elements, such as the redrawing of Minard's figurative map. Since then, there has been a rapid increase in technological development which allows for the pioneering of new visualization methods to be employed upon much larger scales of data.

Terminology

As data visualization's emergence comes from the intersection of fields like data science, statistics, and information, it consists of specific terminology. It is important to recognize the kind of data that a visualization is representing, as different types of data call for different representation.

Categorical vs. Quantitative

Categorial data (also referred to as qualitative data) are descriptive information about characteristics which are difficult to define or measure, or cannot be expressed numerically[8]. Objects are grouped together based upon similarities defined by the scientist. Categories themselves can either be nominal, meaning they have no order, or ordinal, meaning there is order between them. Examples of nominal categorical data include gender, flavor, and texture. A well-known example of ordinal categorical data is the Likert scale.

Quantitative data are measures of values or counts that can be expressed numerically[9]. These values can either be discrete or continuous. Discrete variables take on a finite number of possibilities, while continuous variables take on an infinite number of values on a continuous scale. Examples of discrete quantitative data include word count or population. Examples of continuous quantitative data include temperature or weight.

Infographics

Once a scientist is able to recognize which kind of data they are working with, the type of visualization which best portrays the information may vary. The designer must choose between a variety of infographic types.

Statistical

Statistical graphics show trends on distributions of numbers. These include diagrams, charts, graphs, tables, and lists[10].

Cartograms distort and resize elements to convey information
Cartograms

Cartograms are maps which distort reality to convey information. They resize, exaggerate, and emphasize certain variables in a proportionate manner. Types of cartograms include density-equalizing, in which areas bulge out in accordance to the featured variable, non-contiguous, in which the objects can move around freely, and Dorling, in which the objects are represented as shapes to bring forth easily recognizable patterns[11].

Time Series

A time series is a collection of observations obtained through repeated measurements across a specific timeline[12]. These are often used by companies to collect data on revenue and profit because individual values matter less than relative changes. Designers can choose to represent a time series through stacked graphs, horizon graphs, index charts, or small multiples.

A food pyramid diagram is one of the most common types of hierarchal visualizations
Hierarchies

Hierarchal visualizations show natural hierarchies within variable categories. For example, the food pyramid is the most common hierarchal infographic, in which levels of the pyramid directly correspond to how often each kind of food should be incorporated into the human diet.

Networks

Network infographics display relationships between elements. These are often used when investigating social connections between people, like friendships or familial relationships. The most common type of network visualization is a force-directed layout, where nodes are connected by links and often repel each other when not related[13].

While these visualizations can be effective on their own, there are many modern infographics which combine multiple types into one. These graphics may also include supplemental features, like text or illustrations[14].

Techniques

Data visualization is used in many different fields and areas of study to inform audiences which include scholars, policy makers, corporate figures, the general public, and more. Data scientists choose methods of visualization which best suit the given dataset and data context.

Exploratory vs. Explanatory

An exploratory data visualization allows those working with large, noisy datasets to make sense of what is inside. Translating to a visual medium can bring forth dominating features otherwise hidden, such as patterns, trends, or anomalous outliers. Explorations is useful when there is a high level of granularity in the data, preventing oversimplification or excessive stripping of the dataset.

In contrast, an explanatory data visualization serves the purpose of telling the data story to an audience. This method is appropriate for when the underlying themes within the dataset are already known. The scientist decides how that data will be represented with the intention of highlighting those themes. Such visualizations may stand on their own, or be part of a larger presentation, such as a speech, newspaper article, or a report.

Exploratory data visualization is well-suited for the data analysis phase, while explanatory data representations are for the communication phase of the scientific method[15].

Informative vs. Persuasive

Soares' Press Freedom: Countries to Watch persuasive visualization

An informative visualization aims for a neutral presentation of the facts in such a way that will educate its audience members. In this case, the audience is allowed to make their own decisions about the topic at hand, without any outside persuasion from the designer. Informative visualizations are often associated with broad data sets, and are intended to turn raw data into information that is easily digestible for the consumer[16].

Examples of informative visualization include world map-style representations, line graphs, and 3-D virtual building or town plan designs

A persuasive visualization is created with the purpose of influencing others or making a desired message more persuasive[17]. With this technique, the designer is directly communicating with the intended audience. The data presented within the visualization is hand-chosen to support the designer's point of view, and is presented carefully so as to convince others of this perspective as well.

An example of persuasive visualization is Marc Soares' Global Press Freedom: Countries to Watch, which investigates the press freedom score for ten different countries and provides additional commentary about each of them. Soares provides a clear visual of countries which are most at risk, conveying his message[18].

Data and Information Art

One of Miebach's weather pattern sculptures

Also known as information art or informatism, data art often entails the unidirectional encoding of information. As a result, the audience may not be able to make sense of the visual presentation to understand the underlying information. Data art translates the data into a visual form without the trying to convey its meaning to others. The designer may condense the data, translate it to a new medium, or simply make it beautiful with no intention for the audience to be able to extract anything from it other than pure enjoyment.

An example of data art is Nathalie Miebach's sculptures made from gathering data on shifting weather patterns within a 24-hour period. Further translating weather data points into a musical score to be performed by an orchestra, Miebach's data art explores the intersection science, data, musical performance, and sculpture for aesthetic purposes[19].

Ethical Dilemmas

Because data visualization is grounded in individual perspective and interpretation of the designer as well as the audience, the ethics behind these processes have been brought into question regarding fairness, bias, and accountability.

Bias

Behind each visual representation, there is a person or a group of people who have made each decision from the dataset's conceptualization to the details of the final visualization. As a result, bias can be inherited, whether intentionally or unintentionally, and alter a visualization's overall influence upon others.

The God Trick

Most of the visualizations that are seen on a day-to-day basis are statistical infographics presented as neutral modes of sharing facts. Charts and diagrams with clean, minimalistic appearances have been generally accepted as non-biased. This is due to what philosopher Donna Haraway coined as "the god trick" in the 1980s[20]. The view from nowhere (from a distance, from up above, like a god) may be data visualization's most prominent feature. This method attempts to mask the people, the methods, and the questions that lie within the dataset. The approach has been contested due to several cases confirming that, despite its intentions, there is still some perspective incorporated within what the designer decides to show and what not to show. These perspectives also tend to favor the dominant group's perspective[21]. Because all data are local in terms of situational and geographical context, meaning that data are not universal nor singular, "the god trick" has been rendered a failed method of eliminating bias from data visualization[22].

The Language of Data

The phrase "raw data" is ubiquitous in the data science world. It is used to refer to data before it is processed and manipulated. The idea that the numbers haven't yet been tampered with by a person or people alludes to the understanding that data, at its purest form, remains neutral. The appeal for this phrase leads to the growing belief that data technology is autonomous and objective. Other words often associated with data such as "collected", "entered", "mined", "stored", and "interpreted" contribute to this perspective as well. This can have a dangerous effect because it allows for those behind data visualizations to avoid accountability. Instead, they can push blame onto the numbers themselves, or the technologies which are being used[23].

Data Literacy

Data literacy is the ability to read, understand, create, and communicate data as information. The demographics of the data science field are not diverse, nor accurately represent the population as a whole. The teams creating data visualizations rarely look like those who are most affected by the consequences[24].

Counterdata Visualization

Counterdata (femicides), The Geurilla Girls, A Sort of Joy, Stephanie Dinkins exhibit

References

  1. “A Brief History of Data Visualization.” Accessed January 21, 2022. https://www.dundas.com/resources/blogs/introduction-to-business-intelligence/brief-history-data-visualization.
  2. Manuela Aparicio, Carlos J. Costa. “Data Visualization.” Communication Design Quarterly, November 2014.
  3. Few, Stephen. “Data Visualization - Past, Present, and Future,” n.d., 12.
  4. Healy, Kieran. Data Visualization: A Practical Introduction. Princeton University Press, 2018.
  5. “A Brief History of Data Visualization.” Accessed January 21, 2022. https://www.dundas.com/resources/blogs/introduction-to-business-intelligence/brief-history-data-visualization.
  6. Friendly, Michael. “The Golden Age of Statistical Graphics.” Statistical Science 23, no. 4 (November 1, 2008). https://doi.org/10.1214/08-STS268.
  7. The Interaction Design Foundation. “Information Visualization – A Brief 20th and 21st Century History.” Accessed January 21, 2022. https://www.interaction-design.org/literature/article/information-visualization-a-brief-20th-and-21st-century-history.
  8. Helmenstine, Anne. “Qualitative and Quantitative Data - Definitions and Examples.” Science Notes and Projects (blog), March 27, 2017. https://sciencenotes.org/qualitative-quantitative-data-definitions-examples/.
  9. Statistics, c=AU; o=Commonwealth of Australia; ou=Australian Bureau of. “Statistical Language - Quantitative and Qualitative Data.” c=AU; o=Commonwealth of Australia; ou=Australian Bureau of Statistics. Accessed January 22, 2022. https://www.abs.gov.au/websitedbs/D3310114.nsf/Home/Statistical+Language+-+quantitative+and+qualitative+data.
  10. Siricharoen, Waralak. “Infographics the New Communication Tools in Digital Age,” September 13, 2013.
  11. GISGeography. “Cartogram Maps: Data Visualization with Exaggeration.” GIS Geography, September 18, 2016. https://gisgeography.com/cartogram-maps/.
  12. Statistics, c=AU; o=Commonwealth of Australia; ou=Australian Bureau of. “Time Series Analysis: The Basics.” c=AU; o=Commonwealth of Australia; ou=Australian Bureau of Statistics. Accessed January 22, 2022. https://www.abs.gov.au/websitedbs/d3310114.nsf/home/time+series+analysis:+the+basics.
  13. Kobourov, Stephen G. “Force-Directed Drawing Algorithms.” . . FORCE, n.d., 26.
  14. Van Slembrouck, Paul. “Analyzing the Top 30 Infographics on Visually,” May 15, 2012. https://rockcontent.com/blog/top-30-viral-infographics/.
  15. “1. Classifications of Visualizations - Designing Data Visualizations [Book].” Accessed January 21, 2022. https://www.oreilly.com/library/view/designing-data-visualizations/9781449314774/ch01.html.
  16. The Interaction Design Foundation. “What Is Information Visualization?” Accessed January 21, 2022. https://www.interaction-design.org/literature/topics/information-visualization.
  17. Pandey, Anshul Vikram, Anjali Manivannan, Oded Nov, Margaret L. Satterthwaite, and Enrico Bertini. “The Persuasive Power of Data Visualization.” SSRN Scholarly Paper. Rochester, NY: Social Science Research Network, July 31, 2014. https://papers.ssrn.com/abstract=2474695.
  18. Murray, Eva. “Data Visualization And The Power Of Persuasion.” Forbes. Accessed January 21, 2022. https://www.forbes.com/sites/evamurray/2019/02/11/data-visualization-and-the-power-of-persuasion/.
  19. Nordic APIs. “6 Inspiring Examples of Data-Driven Art | Nordic APIs |,” February 17, 2020. https://nordicapis.com/6-inspiring-examples-of-data-driven-art/.
  20. Haraway, Donna. “Situated Knowledges: The Science Question in Feminism and the Privilege of Partial Perspective.” Feminist Studies 14, no. 3 (1988): 575–99. https://doi.org/10.2307/3178066.
  21. D’Ignazio, Catherine, and Lauren Klein. “3. On Rational, Scientific, Objective Viewpoints from Mythical, Imaginary, Impossible Standpoints.” In Data Feminism, 2020. https://data-feminism.mitpress.mit.edu/pub/5evfe9yd/release/2.
  22. Loukissas, Yanni Alexander. All Data Are Local: Thinking Critically in a Data-Driven Society. MIT Press, 2019.
  23. Gitelman, Lisa. Raw Data Is an Oxymoron. MIT Press, 2013.
  24. “1. The Power Chapter · Data Feminism.” Accessed January 22, 2022. https://data-feminism.mitpress.mit.edu/pub/vi8obxh7/release/4.