Metadata

From SI410
Revision as of 14:32, 12 March 2021 by Babbittw (Talk | contribs) (added privacy implications section)

Jump to: navigation, search

Metadata, specifically the metadata of digital content, can broadly be defined as data about data. More specifically it is the collection of information that helps to summarize the history and current state of various forms of digital information including but not limited to: email, cellular phones, social media accounts/posts, any digital file, and software applications.[1] Metadata is often created as a direct result of viewing and creating other data[2], and is often done so implicitly and without the knowledge of the user. Metadata enables many of the features that users see in the programs, however it potentially comes at the cost of data privacy[1]. New methods of automatically scraping metadata could unknowingly put users data at risk, and there is still no clear consensus on what to do about it or where the law stands on the issues.[3]


What Metadata Is

Metadata is data that describes other pieces of data. Historically metadata was used to describe the information used in a physical format for the indexing and organization of libraries, but as the use of digital data has continued to rise, the term has been adopted to describe the additional information alongside data that fulfills a similar role as the physical counterpart. (Brief History ref)

Currently, the term can be used to describe a large variety of different types of information, the National Information Standards Organization divides metadata into three categories based on its use: descriptive, structural, and administrative. (NISO ref)

Descriptive: “describes a resource for purposes such as discovery and identification. ”

  • This includes identifying factors such as the name of the document, the author of it, or related search terms.

Structural: “indicates how compound objects are put together”.

  • This includes things like page numbers or linking data.

Administrative: “Provides information to help manage a resource”.

  • This includes data such as time of creation, creation methods, permissions, and other technical information.


These pieces of information can be very useful for a variety of purposes including resource discovery, organization of resources, interoperability, digital and unique identification, and preservation of data.(NISO ref) Therefore the preservation of metadata can be important in the structure of digital databases of all kinds.

Included Information

Due to the large differences in the types of metadata, and data in general, the exact things that metadata store is largely dependent on what the data it is associated with. Examples of metadata for different types of associated data include(techrepublic):

  • Email(techrepublic)
    • Sender’s name, email, and IP address
    • Recipient’s name and email address
    • Subject line
  • Cellular Phones(techrepublic)
    • Phone number of every call received
    • Time of call
    • Duration of call
    • Location of caller and recipient
  • Web Browsers(techrepublic)
    • User’s IP address, ISP, device, and OS
    • Browser history
    • Cached data from websites
    • User login details from auto-fill

A common theme among the different types of metadata are that they store utility information that can be used by applications to perform further functions. However, metadata also often includes further identifying information than may be obvious to the creator or distributor of the data.(Law example ref)

Collection

The collection and viewing of metadata can be done in two main ways, manually and automatically. The manual approach would be the opening of digital files or other data in some form of viewing or editing software and seeking metadata that may be saved along with it. The automatic method, also known as metadata discovery or harvesting (Wiki Ref), has larger implications due to the volume of data it can process. Companies such as Octopai have begun to use machine learning to manage metadata from within a dataset, and even map out connections and trends in the metadata.(dataversity ref)

Ethical Implications

Implications in Privacy

One concern about metadata is the threat that it poses to the security of users' information. While the stance on the responsibilities of user privacy when it comes to data in general is more understood and firm, with 120 countries having passed specific laws to protect data privacy as of 2017(Global Data Privacy ref), the situation surrounding metadata is much less clear.

The 2014 Edward Snowden leak revealed the Nation Security Agency (NSA) in the United States has been deeply involved with the recording and logging of phone calls in forign countries, however there is also evidence from the whistleblower Russel Tice that shows that the NSA is collecting the content and metadata of all digital communications.(washington_independent ref) The significance of this massive storing of users’ metadata is that trends and patterns can be used to make assumptions about user activity and identity even without direct evidence of the case.(techrepublic ref)

Additional complications factor in when considering encryption the selling of data buy companies. User data, such as email content, is often encrypted so that only the creator and receiver of the data are able to view the content, however metadata is often not encrypted. This gives companies the ability to send the metadata associated with encrypted data, which could be used to come to conclusions that would otherwise be impossible to make. (wharton ref)

Implications in Law

How it is Addressed

See Also

References

  1. 1.0 1.1 "Is metadata collected by the government a threat to your privacy?" TechRepublic. Retrieved March 11, 2021.
  2. "Explainer: what is metadata? Should I worry about mandatory data retention?" Guardian. Retrieved March 11, 2021.
  3. "The Ethics of Metadata Mining: Ethics Opinion 665 Raises More Questions than Answers"Martindale Legal Library. Retrieved March 11, 2021.