Difference between revisions of "Data Visualization"

From SI410
Jump to: navigation, search
Line 22: Line 22:
  
 
==== Categorical vs. Quantitative ====
 
==== Categorical vs. Quantitative ====
Once a scientist is able to recognize the type of data they are working with, the type of visualization which best portrays the information may vary. The designer must choose between a variety of infographic types.
+
Categorial data (also referred to as qualitative data) are descriptive information about characteristics which are difficult to define or measure, or cannot be expressed numerically<ref>Helmenstine, Anne. “Qualitative and Quantitative Data - Definitions and Examples.” Science Notes and Projects (blog), March 27, 2017. https://sciencenotes.org/qualitative-quantitative-data-definitions-examples/.
 +
</ref>. Objects are grouped together based upon similarities defined by the scientist. Categories themselves can either be nominal, meaning they have no order, or ordinal, meaning there is order between them. Examples of nominal categorical data include gender, flavor, and texture. A well-known example of ordinal categorical data is the [https://en.wikipedia.org/wiki/Likert_scale Likert scale].
  
==== Types of Infographics ====
+
Quantitative data are measures of values or counts that can be expressed numerically<ref>Statistics, c=AU; o=Commonwealth of Australia; ou=Australian Bureau of. “Statistical Language - Quantitative and Qualitative Data.” c=AU; o=Commonwealth of Australia; ou=Australian Bureau of Statistics. Accessed January 22, 2022. https://www.abs.gov.au/websitedbs/D3310114.nsf/Home/Statistical+Language+-+quantitative+and+qualitative+data.
 +
</ref>. These values can either be discrete or continuous. Discrete variables take on a finite number of possibilities, while continuous variables take on an infinite number of values on a continuous scale. Examples of discrete quantitative data include word count or population. Examples of continuous quantitative data include temperature or weight.
 +
 
 +
==== Infographics ====
 +
Once a scientist is able to recognize which kind of data they are working with, the type of visualization which best portrays the information may vary. The designer must choose between a variety of [https://en.wikipedia.org/wiki/Infographic infographic] types.
 +
 
 +
====== Statistical ======
 +
Statistical graphics show trends on distributions of numbers. These include diagrams, charts, graphs, tables, and lists<ref>Siricharoen, Waralak. “Infographics the New Communication Tools in Digital Age,” September 13, 2013.
 +
</ref>.
 +
 
 +
====== Cartograms ======
 +
Cartograms are maps which distort reality to convey information. They resize, exaggerate, and emphasize certain variables in a proportionate manner. Types of cartograms include density-equalizing, in which areas bulge out in accordance to the featured variable, non-contiguous, in which the objects can move around freely, and Dorling, in which the objects are represented as shapes to bring forth easily recognizable patterns<ref>GISGeography. “Cartogram Maps: Data Visualization with Exaggeration.” GIS Geography, September 18, 2016. https://gisgeography.com/cartogram-maps/.
 +
</ref>.
 +
 
 +
====== Time Series ======
 +
A time series is a collection of observations obtained through repeated measurements across a specific timeline<ref>Statistics, c=AU; o=Commonwealth of Australia; ou=Australian Bureau of. “Time Series Analysis: The Basics.” c=AU; o=Commonwealth of Australia; ou=Australian Bureau of Statistics. Accessed January 22, 2022. https://www.abs.gov.au/websitedbs/d3310114.nsf/home/time+series+analysis:+the+basics.
 +
</ref>. These are often used by companies to collect data on revenue and profit because individual values matter less than relative changes. Designers can choose to represent a time series through stacked graphs, horizon graphs, index charts, or [https://en.wikipedia.org/wiki/Small_multiple small multiples].
 +
 
 +
====== Hierarchies ======
 +
Hierarchal visualizations show natural hierarchies within variable categories. For example, the [https://en.wikipedia.org/wiki/Food_pyramid_(nutrition) food pyramid] is the most common hierarchal infographic, in which levels of the pyramid directly correspond to how often each kind of food should be incorporated into the human diet.
 +
 
 +
====== Networks ======
 +
Network infographics display relationships between elements. These are often used when investigating social connections between people, like friendships or familial relationships. The most common type of network visualization is a force-directed layout, where nodes are connected by links and often repel each other when not related<ref>Kobourov, Stephen G. “Force-Directed Drawing Algorithms.” . . FORCE, n.d., 26.
 +
</ref>.
 +
 
 +
While these visualizations can be effective on their own, there are many modern infographics which combine multiple types into one. These graphics may also include supplemental features, like text or illustrations<ref>Van Slembrouck, Paul. “Analyzing the Top 30 Infographics on Visually,” May 15, 2012. https://rockcontent.com/blog/top-30-viral-infographics/.
 +
</ref>.
  
 
== Techniques ==
 
== Techniques ==

Revision as of 17:09, 22 January 2022

Back • ↑Topics • ↑Categories

Data-viz.jpeg

Data visualization (also called information visualization or statistical visualization) is defined as the design, development, and application of computer-generated graphical representation of data[1]. Today, computers can be used to process and display large amounts of data in a way that is efficient, easily accessible, and understandable. The human mind is visual by nature. As a result, visualization is found everywhere: ranging from lines and points on a graph to the standardized symbols called emojis. Whether the underlying information encompasses strict quantitative data or an individual's wish to convey a certain emotion, data visualization is, on a basic level, a method of communicating information and ideas[2].

As the amount of data accumulated due to the rise of the Internet increasingly outsizes what existed before, the need to wrangle, process, and analyze this information has increased as well. Large industries and organizations particularly value the tools used to represent data because they enable decision makers to comprehend information and form an opinion in an efficient, profitable manner[3]. Scientists must choose how they want to represent their data, as well as consider the audience they intend to show it to because data visualization influences how people make sense of the information before them[4]. The processes and decisions which go into creating these representations have caused ethical concerns to arise regarding fairness, bias, and accountability.

History

The first visual representation of statistical data is believed to have been provided by Flemish astronomer Michael Florent van Langren in 1644. In a line graph which records the twelve known estimates at the time of the difference in longitude between Rome and Toledo as well as the name of each astronomer who provided the estimate, van Langren's visualization was notable for its visual portrayal of the wide variations in estimates[5]. In the 18th century, thematic mapping originated in an attempt to catalogue geologic, economic, and medical data, which introduced abstract graphs of functions, measurement error, and collection of empirical data.

Minard's chart from the 1800s, now digitized and interactive

The latter half of the 19th century is what Canadian psychologist Michael Friendly refers to as the "Golden Age of statistical graphics"[6]. This time period hosts famous examples of data visualization, including John Snow's map of cholera outbreaks in the London epidemic of 1854, Charles Minard's 1869 chart showing the number of men in Napoleon's 1812 infamous Russian campaign army, and a new type of visualization called the Rose Diagram from Florence Nightingale. The Golden Age stemmed from the industrial revolution, the establishment of official government statistical offices due to rising population, and a growing recognition for the importance of numerical data in fields like social planning, medicine, military, industrialization, commerce, and transportation.

The 20th century, however, brought about the greatest progression in data visualization because of the development of computing power[7]. The late 1950s and 1960s brought about the adoption of the programming language, FORTRAN, which allowed for the the creation of statistical data processed by computers. Visualizations from history could now be rendered in increasing detail and with interactive elements, such as the redrawing of Minard's figurative map. Since then, there has been a rapid increase in technological development which allows for the pioneering of new visualization methods to be employed upon much larger scales of data.

Terminology

As data visualization's emergence comes from the intersection of fields like data science, statistics, and information, it consists of specific terminology. It is important to recognize the kind of data that a visualization is representing, as different types of data call for different representation.

Categorical vs. Quantitative

Categorial data (also referred to as qualitative data) are descriptive information about characteristics which are difficult to define or measure, or cannot be expressed numerically[8]. Objects are grouped together based upon similarities defined by the scientist. Categories themselves can either be nominal, meaning they have no order, or ordinal, meaning there is order between them. Examples of nominal categorical data include gender, flavor, and texture. A well-known example of ordinal categorical data is the Likert scale.

Quantitative data are measures of values or counts that can be expressed numerically[9]. These values can either be discrete or continuous. Discrete variables take on a finite number of possibilities, while continuous variables take on an infinite number of values on a continuous scale. Examples of discrete quantitative data include word count or population. Examples of continuous quantitative data include temperature or weight.

Infographics

Once a scientist is able to recognize which kind of data they are working with, the type of visualization which best portrays the information may vary. The designer must choose between a variety of infographic types.

Statistical

Statistical graphics show trends on distributions of numbers. These include diagrams, charts, graphs, tables, and lists[10].

Cartograms

Cartograms are maps which distort reality to convey information. They resize, exaggerate, and emphasize certain variables in a proportionate manner. Types of cartograms include density-equalizing, in which areas bulge out in accordance to the featured variable, non-contiguous, in which the objects can move around freely, and Dorling, in which the objects are represented as shapes to bring forth easily recognizable patterns[11].

Time Series

A time series is a collection of observations obtained through repeated measurements across a specific timeline[12]. These are often used by companies to collect data on revenue and profit because individual values matter less than relative changes. Designers can choose to represent a time series through stacked graphs, horizon graphs, index charts, or small multiples.

Hierarchies

Hierarchal visualizations show natural hierarchies within variable categories. For example, the food pyramid is the most common hierarchal infographic, in which levels of the pyramid directly correspond to how often each kind of food should be incorporated into the human diet.

Networks

Network infographics display relationships between elements. These are often used when investigating social connections between people, like friendships or familial relationships. The most common type of network visualization is a force-directed layout, where nodes are connected by links and often repel each other when not related[13].

While these visualizations can be effective on their own, there are many modern infographics which combine multiple types into one. These graphics may also include supplemental features, like text or illustrations[14].

Techniques

Data visualization is used in many different fields and areas of study to inform audiences which include scholars, policy makers, corporate figures, the general public, and more. Data scientists choose methods of visualization which best suit the given dataset and data context.

Exploratory versus Explanatory

An exploratory data visualization allows those working with large, noisy datasets to make sense of what is inside. Translating to a visual medium can bring forth dominating features otherwise hidden, such as patterns, trends, or anomalous outliers. Explorations is useful when there is a high level of granularity in the data, preventing oversimplification or excessive stripping of the dataset.

In contrast, an explanatory data visualization serves the purpose of telling the data story to an audience. This method is appropriate for when the underlying themes within the dataset are already known. The scientist decides how that data will be represented with the intention of highlighting those themes. Such visualizations may stand on their own, or be part of a larger presentation, such as a speech, newspaper article, or a report.

Exploratory data visualization is well-suited for the data analysis phase, while explanatory data representations are for the communication phase of the scientific method[15].

Informative versus Persuasive

Soares' Press Freedom: Countries to Watch persuasive visualization

An informative visualization aims for a neutral presentation of the facts in such a way that will educate its audience members. In this case, the audience is allowed to make their own decisions about the topic at hand, without any outside persuasion from the designer. Informative visualizations are often associated with broad data sets, and are intended to turn raw data into information that is easily digestible for the consumer[16].

Examples of informative visualization include world map-style representations, line graphs, and 3-D virtual building or town plan designs

A persuasive visualization is created with the purpose of influencing others or making a desired message more persuasive[17]. With this technique, the designer is directly communicating with the intended audience. The data presented within the visualization is hand-chosen to support the designer's point of view, and is presented carefully so as to convince others of this perspective as well.

An example of persuasive visualization is Marc Soares' Global Press Freedom: Countries to Watch, which investigates the press freedom score for ten different countries and provides additional commentary about each of them. Soares provides a clear visual of countries which are most at risk, conveying his message[18].

Data art

One of Miebach's weather pattern sculptures

Also known as information art or informatism, data art often entails the unidirectional encoding of information. As a result, the audience may not be able to make sense of the visual presentation to understand the underlying information. Data art translates the data into a visual form without the trying to convey its meaning to others. The designer may condense the data, translate it to a new medium, or simply make it beautiful with no intention for the audience to be able to extract anything from it other than pure enjoyment.

An example of data art is Nathalie Miebach's sculptures made from gathering data on shifting weather patterns within a 24-hour period. Further translating weather data points into a musical score to be performed by an orchestra, Miebach's data art explores the intersection science, data, musical performance, and sculpture for aesthetic purposes.

References

  1. “A Brief History of Data Visualization.” Accessed January 21, 2022. https://www.dundas.com/resources/blogs/introduction-to-business-intelligence/brief-history-data-visualization.
  2. Manuela Aparicio, Carlos J. Costa. “Data Visualization.” Communication Design Quarterly, November 2014.
  3. Few, Stephen. “Data Visualization - Past, Present, and Future,” n.d., 12.
  4. Healy, Kieran. Data Visualization: A Practical Introduction. Princeton University Press, 2018.
  5. “A Brief History of Data Visualization.” Accessed January 21, 2022. https://www.dundas.com/resources/blogs/introduction-to-business-intelligence/brief-history-data-visualization.
  6. Friendly, Michael. “The Golden Age of Statistical Graphics.” Statistical Science 23, no. 4 (November 1, 2008). https://doi.org/10.1214/08-STS268.
  7. The Interaction Design Foundation. “Information Visualization – A Brief 20th and 21st Century History.” Accessed January 21, 2022. https://www.interaction-design.org/literature/article/information-visualization-a-brief-20th-and-21st-century-history.
  8. Helmenstine, Anne. “Qualitative and Quantitative Data - Definitions and Examples.” Science Notes and Projects (blog), March 27, 2017. https://sciencenotes.org/qualitative-quantitative-data-definitions-examples/.
  9. Statistics, c=AU; o=Commonwealth of Australia; ou=Australian Bureau of. “Statistical Language - Quantitative and Qualitative Data.” c=AU; o=Commonwealth of Australia; ou=Australian Bureau of Statistics. Accessed January 22, 2022. https://www.abs.gov.au/websitedbs/D3310114.nsf/Home/Statistical+Language+-+quantitative+and+qualitative+data.
  10. Siricharoen, Waralak. “Infographics the New Communication Tools in Digital Age,” September 13, 2013.
  11. GISGeography. “Cartogram Maps: Data Visualization with Exaggeration.” GIS Geography, September 18, 2016. https://gisgeography.com/cartogram-maps/.
  12. Statistics, c=AU; o=Commonwealth of Australia; ou=Australian Bureau of. “Time Series Analysis: The Basics.” c=AU; o=Commonwealth of Australia; ou=Australian Bureau of Statistics. Accessed January 22, 2022. https://www.abs.gov.au/websitedbs/d3310114.nsf/home/time+series+analysis:+the+basics.
  13. Kobourov, Stephen G. “Force-Directed Drawing Algorithms.” . . FORCE, n.d., 26.
  14. Van Slembrouck, Paul. “Analyzing the Top 30 Infographics on Visually,” May 15, 2012. https://rockcontent.com/blog/top-30-viral-infographics/.
  15. “1. Classifications of Visualizations - Designing Data Visualizations [Book].” Accessed January 21, 2022. https://www.oreilly.com/library/view/designing-data-visualizations/9781449314774/ch01.html.
  16. The Interaction Design Foundation. “What Is Information Visualization?” Accessed January 21, 2022. https://www.interaction-design.org/literature/topics/information-visualization.
  17. Pandey, Anshul Vikram, Anjali Manivannan, Oded Nov, Margaret L. Satterthwaite, and Enrico Bertini. “The Persuasive Power of Data Visualization.” SSRN Scholarly Paper. Rochester, NY: Social Science Research Network, July 31, 2014. https://papers.ssrn.com/abstract=2474695.
  18. Murray, Eva. “Data Visualization And The Power Of Persuasion.” Forbes. Accessed January 21, 2022. https://www.forbes.com/sites/evamurray/2019/02/11/data-visualization-and-the-power-of-persuasion/.