Metadata

From SI410
Jump to: navigation, search

Metadata, specifically the metadata of digital content, can broadly be defined as data about data.[1] It is all of the information that helps summarize the history and current state of various forms of digital information including, but not limited to, email, cellular phones, social media accounts/posts, any digital files, and software applications.[1] Metadata is often created as a direct result of viewing and creating other data[2], and this is often done implicitly and without the knowledge of the user. Ethical issues have emerged in relation to biased design of metadata classification structures, as well as the potential misuse of users’ metadata and the violation of privacy rights.[1] New methods of automatically scraping metadata make it easier to collect and analyze metadata, increasing risks of misuse and privacy violations.[3] Currently there is no consensus over the responsibility of companies, governments, and other interested parties when it comes to the ethical use of metadata.[3]


History

Metadata is not the data itself, but the additional information about the data.[4]

Metadata is data that describes other pieces of data. Historically metadata described the information used in a physical format for the indexing and organization of libraries, but as the use of digital data has grown, the term has been adopted to describe the additional information alongside digital data that fulfills a similar role as its physical counterpart. [5] The term is used to describe a variety of different types of information. The National Information Standards Organization divides metadata into four categories based on its use: descriptive metadata, structural metadata, administrative metadata, and markup languages. [6] These pieces of information can be very useful for resource discovery, organization of resources, interoperability, digital and unique identification, and preservation of data.[6] The preservation of metadata can be important in the structure of all kinds of digital databases.

1. Descriptive: “For finding or understanding a resource.”

  • This includes identifying factors such as the name of the document, the author, or related search terms.

2. Structural: “Relationships of sections of resources to one another”.

  • This includes page numbers or linking data.

3. Administrative: A combination of technical information.

  • This includes data about the time of creation, creation methods, permissions, and other technical information.

4. Markup Languages: "Integrates metadata and flags for other structural or semantics features within context."

  • This includes sections of a piece of data such as lists, paragraphs, etc.


Collecting Metadata

The collection and viewing of metadata can be done in two ways, manually and automatically. The manual approach involves opening digital files or other data in some form of viewing or editing software and seeking metadata that may be saved along with it. The automatic method, also known as metadata discovery or harvesting, has larger implications due to the volume of data it can process. Companies such as Octopai have begun to use machine learning to manage metadata from within a dataset and even map out connections and trends in the metadata.[5]

Uses of Metadata

Due to the large differences in the types of metadata, and data in general, the exact information that different metadata store is largely dependent on the data it is associated with. A common theme among the different types of metadata is that they store utility information that can be used by applications to perform further functions. Metadata also often includes further identifying information about the data itself so that computer systems can better analyze and recognize their features.

Metadata of a webpage[4]
Metadata of a video[7]

Images

Metadata is often included in images as a way to provide more information about the history, origin, and context of digital pictures. Within image metadata, it can describe technical aspects of the photo itself. [8] These features include the camera used to take the picture, the shutter speed, the focal length, and the dots per inch. Images can also include copyright and licensing rights to the image to indicate who owns the permission to use these pictures. [9] Some images provide location information about where it was created/taken. This provides users to extract GPS coordinates for images that include locational metadata. Other metadata can describe what is seen in the picture to aid search engines and information retrieval systems to index and identify images based on query searches. [8]

Image metadata has multiple forms. These forms include Information Interchange Model (IPTC), Extensible Metadata Platform (XMP), EXchangable Image File (Exif), Dublin Core Metadata Initiative (DCMI) and Picture Licensing Universal System (PLUS).[10]

Videos

Metadata is also used in videos similar to how they are used in images. Metadata in videos usually describes features of the video itself such as its video quality, frames per second, and description of what the video is about. [11] This metadata is used to help information retrieval systems find relevant videos when users search for videos on websites like Youtube. It also helps websites and services understand how to deal with certain videos of varying video qualities. Metadata in videos can also provide historical context on how the video was made, when it was created, and GPS coordinates to where it was shot. [11]

Emails

Metadata is also used in emails to give context about their origin, history, and recipients. [12] Email metadata includes information about the protocol used to send the email, the time at which it was sent, the sender, and the recipients. Other hidden metadata is used to help email services such as Outlook and Microsoft to identify spam emails such as geolocation. [12]

Other Examples

  • Cellular Phones[1]
    • Phone number of every call received
    • Time of call
    • Duration of call
    • Location of caller and recipient
  • Web Browsers[1]
    • User’s IP address, ISP, device, and OS
    • Browser history
    • Cached data from websites
    • User login details from auto-fill

Real-World Examples of Metadata Collection

National Security Agency Metadata Collection Program

Preceding the events of the 9/11 attacks, the National Security Agency (NSA) metadata collection program began gathering telephony metadata. The NSA collected the following types of telephony metadata: “comprehensive communications routing information, including but not limited to session identifying information (e.g., originating and terminating telephone number, International Mobile station Equipment Identity number, International Mobile Subscriber Identity (IMSI) number, etc., trunk identifier, telephone calling card numbers, and time and duration call.”[13]

Legal Justification

USA PATRIOT Act

The legal justification for the U.S. government's gathering of mass metadata lies in Section 215 of the Uniting and Strengthing America By Providing Appropriate Tools Required to Intercept and Obstruct Terrorism Act of 2001, also known as the USA Patriot Act., which resulted in extended modifications to the Foreign Intelligence Surveillance Act (FISA).[13][14] Such modifications include the expansions of the circumstances under which surveillance can occur, the expansion of types of records that can be searched and obtained for FISA investigations, and more.[14]

Fourth Amendment

The government claimed the NSA’s collection of metadata complied with the Fourth Amendment, which states that probable cause is needed before a search warrant is issued [14], in order to protect citizens from unreasonable searches and seizures[15]. The reasonable grounds, or probable cause, being that all records collected are used to “find patterns and connections in preventing terrorist activity.”[15]

Court Rulings

Various court rulings were referenced to justify the NSA metadata collection program. For example, in the 1967 case of the United States v. Katz, it was determined that violation of the Fourth Amendment occurs when the government violates the defender’s “reasonable expectation of privacy.”[13] The government also supported its argument with the third-party doctrine, which holds that there are “no fourth amendment protections for information voluntarily disclosed to third parties”, as first determined in the ruling of the 1976 case, the United States vs. Miller.[13] In the case 1979 Smith v. Maryland, the Court ruled that “individual phone users have no reasonable expectation of privacy in the records of their phone class- the numbers called and the duration of the calls- since phone users must know that the third party, the company itself, has access to the information.” [16]

Revisions

In 2015, the USA Freedom Act was enacted as a response to the USA Patriot Act expiring. This reform placed constraints on Section 215 of the Patriot Act and ended the government’s collection of “bulk metadata.”[17]

Metadata through the OCAP Framework

Framework for working with Indigenous data. via David Henry[18]

OCAP, or Ownership, Control, Access, and Possession, is a “set of standards that establish important ground rules for how data can be collected, protected, used or shared”[19]. The principles are designed to ensure that underserved and indigenous communities own, protect, and control how data about them is used[20]. Ownership reflects the principle that the community collectively owns its data; control reflects the principle that communities have the right to control all aspects of data management that impact them; access reflects the principle that communities must have the right to access data about them regardless of where that data is held; and possession reflects concrete, physical control of the data, and is a means by which ownership can be asserted and protected[21].

Data sharing is a foundational norm in many disciplines, and increased emphasis on metadata sharing by funding agencies and publishers is encouraging all disciplines to make it a priority. Yet there is also a common understanding of “sensitive” metadata, that is, metadata that should not by default be openly shared because it contains personally identifying information, community traditional or local knowledge, or information that could potentially cause harm to individuals or communities. OCAP reflects a worldview in which all knowledge is connected, and in which communities rather than individuals hold rights and interests in their information and metadata. [22] It recognizes and respects that ownership, control, access, and possession of information and metadata are critical to the maintenance and development of their languages, cultures, and histories. Secondly, the use of OCAP as a framework for approaching discussions on appropriate and ethical metadata sharing ensures that all individuals and communities involved have a common language with which to express their interests and concerns. It allows for a shared understanding of the concepts of data and information and creates a safe space for discussion of the problematic history of metadata users and those who have collected or created metadata by, from, on, or about them[23].

Ethical Implications

Implications of Metadata Taxonomies and Classification Systems

As metadata is designed to represent and allow a method to search information that is subject to ethical concerns, these subjects can become embedded within the organizational structure itself.[24][25] Furthermore, it has been questioned whether such cases inhibit equitable use of a metadata system, which could interfere with information retrieval and discovery. [24] One particularly problematic case is that handling of gender identity metadata in outdated and inflexible taxonomies. [25]

Because ease of data retrieval is a major use of metadata, the drive to build vast metadata libraries sometimes prioritizes "the greater good" of a society over the identification preferences of an individual. [26] Metadata practitioners are obligated to give fair representation to a subject due to the ability of metadata to affect how users understand the subject or its source. In the case of gender identity metadata, the available categories may not allow for accurate representations nor the possibility of fluidity. [25] As new metadata standards often build off of existing metadata standards, lack of fair representation can become a perpetual issue when such existing standards are insufficient.[24] An additional issue that arises is a hierarchal effect of classification structures that raises the importance of some aspects of a subject over others. [25]

There is also the responsibility of a metadata system to be understandable to users, such as which metadata fields are available, how to search keywords, and how to combine search queries. [27] The accessibility of a system can be undermined by political motivation, however, as is speculated regarding the creation of XML. [24]

Implications in Privacy

Edward Snowden, a former computer intelligence consultant for the CIA

One concern about metadata is the threat it poses to the security of users' information. While the stance on the responsibilities of user privacy when it comes to data, in general, is more understood and firm, with 120 countries having passed specific laws to protect data privacy as of 2017[28], the situation surrounding metadata is much less clear. Often, privacy of an individual is sacrificed in favor of augmenting access to information. [26] There is also a cultural impact on both the importance and definition of privacy, which would not necessarily be accounted for in a metadata system designed in another region. [26]

The 2014 Edward Snowden leak revealed that the National Security Agency in the United States has been deeply involved with the recording and logging of phone calls in foreign countries; however, there is also evidence from the whistleblower Russell Tice that shows that the NSA is collecting the content and metadata of all digital communications.[29] The significance of this large scale storing of users’ metadata is that patterns can be used to make assumptions about user activity and identity without direct evidence of the case.[1] An example of this is to use the metadata on photos to reverse engineer a user's location over time.[30]

Additional complications arise when considering encryption and the selling of data by companies. User data, such as email content, is often encrypted so that only the creator and receiver of the data are able to view the content, however, metadata is often not encrypted. This gives companies the ability to send the metadata associated with encrypted data, which could be used to come to conclusions that would otherwise be impossible to make. [31]

Implications in Law

One area that is particularly concerned with the ethics of metadata used in the field of law. It plays a big role in relation to discovery and litigation. There has been significant questioning of the ethical implications of the accidental discovery of information during legal proceedings due to the improper handling of metadata.[32]There is potential for metadata to unknowingly expose information held under attorney-client privilege, which has led to the use of metadata harvesting utilities being a topic of some debate.[33]

Current and Potential Solutions

The GDPR is among the most aggressive data protection laws in the world.[34]

There have been a variety of approaches to addressing the ethical implications of metadata both within and across certain fields and applications of metadata. Privacy laws have been enacted to limit the ability of metadata to expose users’ private information, although the breadth of these laws varies greatly by region. The European Union’s General Data Protection Regulation, which went into place in 2018, states that all data that could be used to identify a user must be anonymized, including metadata.[35], where the United States currently is authorized to perform the bulk collection of phone metadata under the USA Patriot Act.[36] Some companies have begun to strip metadata from user data, such as Instagram removing location data from photos uploaded to the site.[37]

In the field of law, as of 2015, 14 bar associations in the United States had made ethics opinions on the use of metadata by lawyers [38]. These opinions differ greatly, with the American Bar Association, the Maryland State Bar Association, and the Vermont Bar Association outright permitting the use of metadata mining, while the New York State Bar Association Committee on Professional Ethics going as far as to say “lawyers may not ethically use available technology to surreptitiously examine and trace e-mail and other electronic documents.” [39].

See Also

References

  1. 1.0 1.1 1.2 1.3 1.4 1.5 Kassner, M. (2013, August 19). Is metadata collected by the government a threat to your privacy? Retrieved March 26, 2021, from https://www.techrepublic.com/blog/it-security/is-metadata-collected-by-the-government-a-threat-to-your-privacy/
  2. Nolan, D. (2014, August 06). Explainer: What Is Metadata? Should I worry about mandatory data RETENTION? Retrieved March 26, 2021, from https://www.theguardian.com/commentisfree/2014/aug/06/explainer-what-is-metadata-should-i-worry-about-mandatory-data-retention
  3. 3.0 3.1 Chadwick, K. N. (2017, May 4). The Ethics of Metadata Mining: Ethics Opinion 665 Raises More Questions than Answers. Retrieved March 25, 2021, from https://www.martindale.com/litigation-law/article__2245212.htm/
  4. 4.0 4.1 Kononow, P. (2018, September 16). What is metadata (with examples) - data terminology. Retrieved March 25, 2021, from https://dataedo.com/kb/data-glossary/what-is-metadata
  5. 5.0 5.1 Foote, K. (2021, February 01). A brief history of metadata. Retrieved March 12, 2021, from https://www.dataversity.net/a-brief-history-of-metadata/#
  6. 6.0 6.1 Riley, J. (2017). Understanding metadata. Washington DC, United States: National Information Standards Organization. Retrieved March 25, 2021, from http://www.niso.org/publications/press/UnderstandingMetadata.pdf
  7. “Video Metadata Example.” Stackoverflow, 15 Oct. 2018, stackoverflow.com/questions/52816497/write-properties-metadata-comments-title-etc-to-a-video-php-curl.
  8. 8.0 8.1 “Photo Metadata IPTC.” IPTC, 27 Sept. 2018, iptc.org/standards/photo-metadata/.
  9. Contributor, TechTarget. “What Is Image Metadata? - Definition from WhatIs.com.” WhatIs.com, TechTarget, 11 June 2015, whatis.techtarget.com/definition/image-metadata.
  10. 11.0 11.1 “What Is Video Metadata Management?” Vidispine, www.vidispine.com/video-metadata-management.
  11. 12.0 12.1 McDowell, Guy, and Guy McDowell (147 Articles Published) . “What Can You Learn From An Email Header (Metadata)?” MUO, 13 Aug. 2013, www.makeuseof.com/tag/what-can-you-learn-from-an-email-header-metadata/.
  12. 13.0 13.1 13.2 13.3 Mornin, J. D. (2014). Nsa metadata collection and the fourth amendment. Berkeley Technology Law Journal, 29(Annual Review), 985-1006. Retrieved from https://heinonline-org.libproxy.law.umich.edu/HOL/Page?collection=journals&handle=hein.journals/berktech29&id=1007&men_tab=srchresults
  13. 14.0 14.1 14.2 Jaeger, P. T., Bertot, J. C., & Mcclure, C. R. (2003). The impact of the USA Patriot Act on collection and analysis of personal information under the Foreign Intelligence Surveillance Act. Government Information Quarterly, 20(3), 295-314. doi:10.1016/s0740-624x(03)00057-1
  14. 15.0 15.1 McGowan, C. J. (2014). The relevance of relevance: Section 215 of the usa patriot act and the nsa metadata collection program. Fordham Law Review, 82(5), 2399-2442. Retrieved from https://heinonline-org.libproxy.law.umich.edu/HOL/Page?collection=journals&handle=hein.journals/flr82&id=2439&men_tab=srchresults#
  15. Barnett, R. (2015). Why the nsa data seizures are unconstitutional. Harvard Journal of Law & Public Policy, 38(1), 3-20. Retrieved from https://heinonline-org.libproxy.law.umich.edu/HOL/Page?handle=hein.journals/hjlpp38&div=5&id=&page=&collection=journals#
  16. Suarez, Sergio, "Is America Safer? The USA FREEDOM Act of 2015 and What the FBI and NSA Have, Can, and Should be Doing" (2017). Law School Student Scholarship. 882. https://scholarship.shu.edu/student_scholarship/882
  17. Walking the Path Together: Indigenous Health Data at ICES - Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/Framework-for-working-with-Indigenous-data_fig1_323984147 [accessed 6 Apr, 2021]
  18. First Nations Information Governance Centre. (n.d.). Understanding the First Nations Principles of OCAP. Ottawa, ON: The First Nations Information Governance Centre.[accessed 6 Apr, 2021]
  19. Ibid.[accessed 6 Apr, 2021]
  20. Hennessy, K. (2009). Virtual repatriation and digital cultural heritage: The ethics of managing online collections. Anthropology News, 50(4), 5-6.[accessed 6 Apr, 2021]
  21. First Nations Information Governance Centre. (2014b). Barriers and levers for the implementation of OCAP. The International Indigenous Policy Journal, 5(2), 1-11.[accessed 6 Apr, 2021]
  22. Duarte, M. E., Belarde-Lewis, M. (2015). Imagining: Creating spaces for Indigenous ontologies. Cataloging & Classification Quarterly, 53(5/6), 677-702.[accessed 6 Apr, 2021]
  23. 24.0 24.1 24.2 24.3 Brody, R. (2003). Information ethics in the design and use of metadata. IEEE Technology and Society Magazine, 22(2), 34-39. doi:10.1109/mtas.2003.1216241
  24. 25.0 25.1 25.2 25.3 Roberto, K. R. (2011). Inflexible bodies: Metadata for transgender identities *. Journal of Information Ethics, 20(2), 56-64. doi:http://dx.doi.org.proxy.lib.umich.edu/10.3172/JIE.20.2.56
  25. 26.0 26.1 26.2 Seeman, D. (2012). Naming Names: The Ethics of Identification in Digital Library Metadata. Knowledge Organization, 39(5), 325–331. https://doi-org.proxy.lib.umich.edu/10.5771/0943-7444-2012-5-325
  26. Haynes, D. (2018). Metadata for Information Management and Retrieval : Understanding Metadata and Its Use. Facet Publishing.
  27. Greenleaf, Graham, Global Data Privacy Laws 2017: 120 National Data Privacy Laws, Including Indonesia and Turkey (January 30, 2017). (2017) 145 Privacy Laws & Business International Report, 10-13, UNSW Law Research Paper No. 17-45, Retrieved March 25, 2021, from https://ssrn.com/abstract=2993035
  28. Griffin, T. (2020, December 28). NSA recorded the content of 'every single' call in a foreign country ... and also In America? Retrieved March 12, 2021, from https://washingtonindependent.com/2014/03/nsa-recorded-every-single-call-one-country-country-america/
  29. Matthews, R. (2017, June 22). “Image Forensics: What Do Your Photos and Their Metadata Say about You?” ABC News, 23 June 2017, Retrieved March 25, 2021, from https://www.abc.net.au/news/2017-06-23/what-your-photos-and-their-metadata-say-about-you/8642630.
  30. (2019, October 28). Your data is shared and sold...what's being done about it? Retrieved March 12, 2021, from https://knowledge.wharton.upenn.edu/article/data-shared-sold-whats-done/
  31. Tremolada, R. (2018). The Legal Ethics of Metadata: Accidental Discovery of Inadvertently Sent Metadata and the Ethics of Taking Advantage of Others' Mistakes. Rich. JL & Tech., 25, 1.
  32. Cite error: Invalid <ref> tag; no text was provided for refs named LLRX
  33. G. (2018, May 20). Part 2: The new data protection law - GDPR, and how it impacts eCommerce businesses. Retrieved March 25, 2021, from https://blog.plugnpaid.com/part-2-the-new-data-protection-law-gdpr-and-how-it-impacts-ecommerce-businesses/
  34. (2016). "Complete guide to GDPR compliance" GDPR. Retrieved March 11, 2021 from https://gdpr.eu/.
  35. US: End bulk data collection program. (2020, October 28). Retrieved March 12, 2021, from https://www.hrw.org/news/2020/03/05/us-end-bulk-data-collection-program#:~:text=The%20USA%20Freedom%20Act%20prohibits,detail%20records%20(CDR)%20program.
  36. Random. (2019, 25 Jan). “Does Instagram Remove EXIF Data from Images?” Alphr. Retrieved March 25, 2021, from https://www.alphr.com/instagram-remove-exif-data-images.
  37. Perlman, Andrew M. (2010) "The Legal Ethics of Metadata Mining," Akron Law Review: Vol. 43: Iss. 3, Article 7. Retrieved March 12, 2021, from http://ideaexchange.uakron.edu/akronlawreview/vol43/iss3/7
  38. Opinion 749. (2020, June 22). Retrieved March 12, 2021, from https://nysba.org/opinion-749/