Metadata

From SI410
Revision as of 01:07, 19 March 2021 by Mahirpiy (Talk | contribs) (What Metadata Is)

Jump to: navigation, search

Metadata, specifically the metadata of digital content, can broadly be defined as data about data. More specifically it is the collection of information that helps to summarize the history and current state of various forms of digital information including but not limited to: email, cellular phones, social media accounts/posts, any digital file, and software applications.[1] Metadata is often created as a direct result of viewing and creating other data[2], and is often done so implicitly and without the knowledge of the user. Metadata enables many of the features that users see in the programs, however it potentially comes at the cost of data privacy[1]. New methods of automatically scraping metadata could unknowingly put users' data at risk, and there is still no clear consensus on what to do about it or where the law stands on the issues.[3]


What Metadata Is

Metadata is not the data itself, but the additional information about the data.

Metadata is data that describes other pieces of data. Historically metadata described the information used in a physical format for the indexing and organization of libraries, but as the use of digital data has continued to rise, the term has been adopted to describe the additional information alongside digital data that fulfills a similar role as the physical counterpart. [4]

Currently, the term can be used to describe a large variety of different types of information, the National Information Standards Organization divides metadata into four categories based on its use: descriptive metadata, structural metadata, administrative metadata, and markup languages. [5]

Descriptive: “For finding or understanding a resource.”

  • This includes identifying factors such as the name of the document, the author of it, or related search terms.

Structural: “Relationships of parts of resources to one another”.

  • This includes things like page numbers or linking data.

Administrative: A combination of technical information.

  • This includes data such as time of creation, creation methods, permissions, and other technical information.

Markup Languages: "Integrates metadata and flags for other structural or semantics features within context."

  • This includes information about sections of a piece of data such as lists, paragraphs, etc.


These pieces of information can be very useful for a variety of purposes including resource discovery, organization of resources, interoperability, digital and unique identification, and preservation of data.[5] Therefore the preservation of metadata can be important in the structure of digital databases of all kinds.

Included Information

Due to the large differences in the types of metadata, and data in general, the exact things that different metadata store is largely dependent on what the data it is associated with. Examples of metadata for different types of associated data include:

  • Email[1]
    • Sender’s name, email, and IP address
    • Recipient’s name and email address
    • Subject line
  • Cellular Phones[1]
    • Phone number of every call received
    • Time of call
    • Duration of call
    • Location of caller and recipient
  • Web Browsers[1]
    • User’s IP address, ISP, device, and OS
    • Browser history
    • Cached data from websites
    • User login details from auto-fill
  • Digital Photographs
    • Headline/captions
    • Owner/rights and licenses
    • Capture location

A common theme among the different types of metadata are that they store utility information that can be used by applications to perform further functions. However, metadata also often includes further identifying information than may be obvious to the creator or distributor of the data.[6]

Collection

The collection and viewing of metadata can be done in two main ways, manually and automatically. The manual approach would be the opening of digital files or other data in some form of viewing or editing software and seeking metadata that may be saved along with it. The automatic method, also known as metadata discovery or harvesting, has larger implications due to the volume of data it can process. Companies such as Octopai have begun to use machine learning to manage metadata from within a dataset, and even map out connections and trends in the metadata.[4]

Ethical Implications

Implications in Privacy

One concern about metadata is the threat that it poses to the security of users' information. While the stance on the responsibilities of user privacy when it comes to data in general is more understood and firm, with 120 countries having passed specific laws to protect data privacy as of 2017[7], the situation surrounding metadata is much less clear.

The 2014 Edward Snowden leak revealed the Nation Security Agency in the United States has been deeply involved with the recording and logging of phone calls in foreign countries; however, there is also evidence from the whistleblower Russel Tice that shows that the NSA is collecting the content and metadata of all digital communications.[8] The significance of this massive storing of users’ metadata is that trends and patterns can be used to make assumptions about user activity and identity even without direct evidence of the case.[1]

Additional complications arise when considering encryption the selling of data buy companies. User data, such as email content, is often encrypted so that only the creator and receiver of the data are able to view the content, however metadata is often not encrypted. This gives companies the ability to send the metadata associated with encrypted data, which could be used to come to conclusions that would otherwise be impossible to make. [9]

Implications in Law

A field that is particularly concerned with the ethics of metadata use is the law field in how it relates to discovery and litigation. There has been significant questioning of the ethical implications of accidental discovery of information during legal proceedings due to the improper handling of metadata.[10]There is potential for metadata to unknowingly expose information held under attorney-client privilege, which has led to the use of metadata harvesting utilities being a topic of some debate.[6]

How it is Addressed

The GDPR is among the most aggressive data protection laws in the world.

There have been a variety of approaches to addressing the ethical issues of metadata both within and across certain fields and applications of metadata. Privacy laws have been enacted to limit the ability of metadata to expose users’ private information, although the breadth of these laws varies greatly by region. The European Union’s General Data Protection Regulation, which went into place in 2018, states that all data that could be used to identity a user must be anonymized, including metadata.[11], where the United States currently is authorized to preform bulk collection of phone metadata under the USA Patriot Act.[12].

As for the implications in law, as of 2015 14 bar associations in the United States had made ethics opinions on the use of metadata by lawyers [13]. These opinions range greatly, with the American bar Association, the Maryland State Bar, and the Vermont State bar outright permitting the use of metadata mining, but the New York State Bar Association Committee on Professional Ethics going as far as to say “lawyers may not ethically use available technology to surreptitiously examine and trace e-mail and other electronic documents.” [14].

See Also

References

  1. 1.0 1.1 1.2 1.3 1.4 1.5 "Is metadata collected by the government a threat to your privacy?" TechRepublic. Retrieved March 11, 2021.
  2. "Explainer: what is metadata? Should I worry about mandatory data retention?" Guardian. Retrieved March 11, 2021.
  3. "The Ethics of Metadata Mining: Ethics Opinion 665 Raises More Questions than Answers"Martindale Legal Library. Retrieved March 11, 2021.
  4. 4.0 4.1 Foote, K. (2021, February 01). A brief history of metadata. Retrieved March 12, 2021, from https://www.dataversity.net/a-brief-history-of-metadata/#
  5. 5.0 5.1 Riley, J. (2017). Understanding metadata. Washington DC, United States: National Information Standards Organization (http://www.niso.org/publications/press/UnderstandingMetadata.pdf), 23.
  6. 6.0 6.1 Calloway, J. (2009, January 05). Metadata – what is it and what are my ethical duties? Retrieved March 12, 2021, from https://www.llrx.com/2009/01/metadata-what-is-it-and-what-are-my-ethical-duties/
  7. Greenleaf, Graham, Global Data Privacy Laws 2017: 120 National Data Privacy Laws, Including Indonesia and Turkey (January 30, 2017). (2017) 145 Privacy Laws & Business International Report, 10-13, UNSW Law Research Paper No. 17-45, Available at SSRN: https://ssrn.com/abstract=2993035
  8. Griffin, T. (2020, December 28). NSA recorded the content of 'every single' call in a foreign country ... and also In America? Retrieved March 12, 2021, from https://washingtonindependent.com/2014/03/nsa-recorded-every-single-call-one-country-country-america/
  9. E. (2019, October 28). Your data is shared and sold...what's being done about it? Retrieved March 12, 2021, from https://knowledge.wharton.upenn.edu/article/data-shared-sold-whats-done/
  10. Riccardo Tremolada, The Legal Ethics of Metadata: Accidental Discovery of Inadvertently Sent Metadata and the Ethics of Taking Advantage of Others’ Mistakes, 25 RICH. J.L. & TECH., no. 4, 2019.
  11. "Complete guide to GDPR compliance" GDPR. Retrieved March 11, 2021.
  12. US: End bulk data collection program. (2020, October 28). Retrieved March 12, 2021, from https://www.hrw.org/news/2020/03/05/us-end-bulk-data-collection-program#:~:text=The%20USA%20Freedom%20Act%20prohibits,detail%20records%20(CDR)%20program.
  13. Perlman, Andrew M. (2010) "The Legal Ethics of Metadata Mining," Akron Law Review: Vol. 43 : Iss. 3 , Article 7. Available at: http://ideaexchange.uakron.edu/akronlawreview/vol43/iss3/7
  14. Opinion 749. (2020, June 22). Retrieved March 12, 2021, from https://nysba.org/opinion-749/