Ethics of Data Mining

From SI410
Jump to: navigation, search

Data mining, at its core, is simply the process of obtaining large data sets and utilizing various analytical tools to investigate and interpret these data sets. In doing this, data mining is used to discover patterns within these large sets in order to develop and test models based on them. Establishing these patterns usually involves a plethora of predictive models, statistical software, and analysis tools. Consequently, data mining can prove to be a time consuming and expensive process[1]. However, as technological advancements continue to develop, skills, such as data mining, have become more and more prevalent. In fact, many ordinary individuals often interact with data mining tools everyday whether it is through their search history or social media. Through data mining, companies can gain access to intimate user data and use it to generate consumer profiles and intelligence. Data miners often sell the personal information of users and customers to other companies. These companies can then use that information to coordinate business efforts such as targeted marketing.[2]

Types of Data Mining

Predictive Data Mining

As suggested by the name, predictive data mining is data mining done with the purpose of predicting or forecasting trends. When companies buy or gain access to user data, they can sort through this data and use forecasting tools to create theories based on the obtained user information. This, by and large, makes heavy use of a businesses' analytical software though results from this type of data mining can often prove to be inaccurate or misleading. The most typical use of predictive data mining is through trend forecasting[3]. A company may sort through a customer's purchase or search history to create targeted advertisements based on this data. In this way, such kind of data mining is often referred to as proactive.

Descriptive Data Mining

On the other hand, descriptive data mining takes a more reactive approach. Instead of making future predictions based on customer data, descriptive data mining relies on using concrete analysis to identify correlational relationships based off of already existing data. Therefore, the data and results provided in this type of data mining is concrete and precise. Businesses gain insight from past data and use this to then learn and establish underlying patterns. For example, a company might keep track of user data on their website to see which part of their site receives the most interaction compared to which parts of their site receive the least amount of interaction. From this, a business might be able to identify touch-points on their website connected to certain user behavior. Businesses often use a combination of both of these data mining techniques to best serve their interests[4].

Process of Data Mining

The process of data mining is often lengthy in time and detail-oriented. From beginning to end, it is a conglomeration of many softwares, models, and minds that ultimately result in the final outcome.

Establishing a Business Goal

Before any business delves into the process of data mining, they must first establish a clear goal that they want to achieve. This occurs prior to the establishment of any software and forces the business to clarify what outcome they hope to attain. For example, a business may hope to increase site engagement, user clicks, or company revenue.

Defining the Data

After establishing a clear goal that the business wants to realize, it is time to define the data. This means that companies need to research what kind of data they want to collect as well as how they plan to obtain it. In this way, it forces businesses to realistically think about the scope of the data they will be able to collect and confront constraints related to storage, collection, and analysis processes. Businesses also brainstorm all viable sources of data during this time.

Gathering the Data

Once the business has a realistic understanding of how much and what kind of data they can gather, it is time to begin the collection process. Businesses can go about this in a number of ways. Many track user behavior on company sites, store personal contact information that consumers have provided, or even get ahold of more intimate information including user search history and purchasing behavior. Once this data has been extracted, it goes through a number of processes to ensure that the data is accurate, sensible, and standardized. This is done by cleaning the data and removing any units that skew the rest or appear far different[5].

Developing a Framework

After all the data has been collected and cleaned, it is time to start developing a framework around this model. Smaller companies may utilize the help of human data analysts for this, but large corporations usually have well-trained models and software that will help to identify patterns or relationships within the data. During this step, connections are formed between the units of data and the outcome that the business has hoped to achieve. Predictive models are also used to establish this.

Data Analysis

Now that patterns have been identified and relationships have been drawn, it is time to determine how this can aim to be a solution for the business issue at hand. Do the patterns tell the business anything about customer behavior? Can it be used to better company marketing efforts? Why do customers consistently click on one part of the company site? These are the types of questions that businesses look to answer during this step of the data mining process. Oftentimes, this step can take many, many, months as it is difficult to make surefire conclusions based on the data. Additionally, the data mining process is a constant cycle, so analysis can change frequently based off of incoming results. During data analysis, companies decide whether there is enough to implement any changes or if the findings do not prove sufficient[6].

Executing Modifications

Once analysis has been concluded and the business feels that the data does make a case, change is ultimately executed. This could be as simple as changing where a tab on a company site will be located or could even go as far as changing the logo of a brand to appeal more to consumers. Regardless, hardly any business decisions are carried out without the support of data behind it, so the process of data mining is important to any business[7].

Relationship Between Data Mining and Social Media

Nature of Relationship

Though there are many ways miners can collect data, social media sites are amongst the largest sources of user information and data mining. Through the nature of social media, many platforms such as Twitter, Instagram, Meta(formerly known as Facebook), and TikTok have access to the personal data of users including gender, ethnicity, age, and even location. Additionally, through each platform's own algorithms, they can track what posts users are liking, how much time they spend on each site, and engagement with other users and ads. The combination of all this data added together helps social media sites get an idea of the behavioral and personal characteristics of each of their users. Through this, social media platforms have a plethora of information on their users that they can sell to other companies looking to get more data on their potential customer profiles. This is often used in respect to company marketing efforts. Social media companies tend to use the aforementioned user data in order to generate customer profiles based on both demographic and behavioral factors. This is useful for companies wanting to get advertisements for their products specifically targeted to those who are most likely to purchase them. As such, social media platforms present themselves as a lucrative space to mine data and collect user information.[8]

Social Media Mining

When data is collected and mined from social media sites in particular, there is a specific phrase coined for it: Social Media Mining. In fact, through social media algorithms that collect platform data, companies can now go past simply trying to interact with the average user. Such is the case with the rise of "Social Media Influencers" and micro-influencers. Companies can now track which individuals do best interacting with each platform's algorithm, and which people are promoted most often. From there, businesses can decide who to send public relations packages to, and inevitably expand the scope of their product's exposure[9]. Additionally, such forms of marketing have been linked to increased sales as customers are more likely to purchase products endorsed by their favorite influencer rather than the brand trying to directly reach out to customers themselves. In this way, there has been much controversy surrounding this in recent years. Many have claimed that the act of companies only reaching out to individuals who seem to be doing well "algorithmically" tends to drown out smaller content creators who others feel are more deserving. Additionally, issues have been raised that companies only relying on the data provided by social media sites could skew who companies view as an "influencer" as there are often cases where individuals are simply promoted because their content consistently goes viral even though it may not go viral for the "right reasons". On the other side, businesses argue that having access to social media data on user content helps them to identify which creators are interacting with users the most. In this way, they feel that it is a fair way to utilize current trends and develop marketing campaigns that are better-tailored to today's consumers[10][11].

2010 Facebook Privacy Scandal

This relationship between data mining and social media was brought to light about a decade back during the controversial scandal involving Cambridge Analytica, a consulting firm located in Britain, and Facebook. In short, the personal information and data of millions of users of Facebook was provided to Cambridge Analytica. After the collection of data from close to 87 million Facebook users was obtained, Cambridge Analytica provided analysis of this data that ultimately helped to aid the presidential campaigns of both Donald Trump and Ted Cruz during their run for presidency in 2016. It was also speculated that the data collected interfered with other sizable incidents including the Brexit Referendum. However, such speculations were not confirmed.[12]

Ethical Concerns of Data Mining

While data mining has become a fairly regularly-used tool for many companies, there are those, including the general public, that raise concerns about the nature of this analytical intelligence. In one way, a worry that some have brought up is the relative ease in which companies are able to access the personal information of users. This has been specifically tied in relation to the flow of information between social media platforms and other companies. Most users take issue in the lack of transparency between such companies in providing proper explanations on what information is being collected, sold, and used by them and other businesses. As a result, many have stated that such uses of personal data go against certain FTC regulations and should be considered a breach of privacy[13]. On the other hand, companies that use the data of users argue that it is up to the users to read the individual "Terms of Agreement" clauses to understand how and when their data is being used. Businesses state that data mining is a useful analytical tool utilized in marketing efforts that ultimately betters customers and provides companies the information needed to grow and develop.[14]

Google Tracking

Along with Facebook, another big company that has come under fire for misuse of consumer data is Google. This is mainly in respect to their tracking softwares within Google Maps and Google Search. It was discovered that even after turning off the tracking option present within Maps or Search, Google was continually collecting this data anyways despite consumers being unaware. This means that Google was collecting a backlog of information on locations users often frequented, common search history, and other private details. From there, Google was able to push out millions of micro-ads personalized to each consumer, resulting in over 135 billion dollars in company revenue. When the general public was questioned about these practices, the majority claimed that they were not aware just how much of their information Google had access to. Consequently, Google received much public backlash due to the lack of transparency surrounding their data collection methods[15].

Regulations Surrounding Data Mining

Due to societal concerns surrounding data transparency and issues around breaches in privacy, certain laws have been established in order to regulate business data mining and protect consumers. These regulations regulate how businesses collect, sell, and analyze the data they have obtained, and businesses that fail to comply with such laws often come under heavy fire from both the government and from the general public. Within the United States, currently, the Federal Trade Commission(FTC) oversees bills related to data regulation. Many of these fall under what is known as The Privacy Bill of Rights[16].

Safeguards Rule

Originally passed by the FTC in 2003, the Safeguards Rule aims to protect consumer information by establishing a clear guide on what practices a business can and can not conduct in regards to data. Under the Safeguards Rule, businesses that conduct data mining are required to create and carry out an information security program to protect user data. This information security program must comply with administrative requirements and must match the nature of your business in respect to how large and established it is. In other words, larger corporations are required to have much more advanced and secure programs compared to businesses that are newly created. Additionally, the FTC has established three main objectives that all security programs must follow: "to ensure the security and confidentiality of customer information, to protect against anticipated threats or hazards to the security or integrity of that information, and to protect against unauthorized access to that information that could result in substantial harm or inconvenience to any customer". Lack of compliance to this rule could result in heavy consequences including large federal fines and even court cases[17].

Gramm-Leach-Bliley Act

In similar fashion, the Gramm-Leach-Bliley Act(GLB Act) works to support the regulations established by the Safeguards Rule. This act was passed back in 1999 and tackles company practices regarding data sharing. In particular, the GLB act states that along with protecting consumer information, financial companies, in particular, must also be transparent in their practices regarding sharing such data with other companies. This act aims to help secure customer information especially in loan or investment markets. Similar to the Safeguards Rule, defiance of the GLB Act can result in serious consequences for a business[18].

Computer Fraud and Abuse Act

Created in 1984, the Computer Fraud and Abuse Act(CFAA) prohibits businesses from unauthorized access to computer information through physical or hacking means. Given the broad nature of the CFAA, it has been amended over the years to try and keep up with advancing technology. However, many have complained that the CFAA does not clearly define what "unauthorized access" truly is and, by default, has proven ineffective. Additionally, since it was last amended in 2008, many feel that it fails to act as a real form of regulation given modern-day technology[19].

References

  1. Mikut, R., & Reischl, M. (2011). Data Mining Tools. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(5), 431–443. https://doi.org/10.1002/widm.24
  2. Twin, A. (2023, January 20). What is data mining? how it works, benefits, techniques, and examples. Investopedia. Retrieved January 26, 2023, from https://www.investopedia.com/terms/d/datamining.asp
  3. Data Mining Process. (n.d.). Data Mining for Managers. https://doi.org/10.1057/9781137406194.0011
  4. What is data mining: Definition, examples, tools, and techniques (for beginners). Georgia Tech Boot Camps. (2021, June 14). Retrieved January 26, 2023, from https://bootcamp.pe.gatech.edu/blog/what-is-data-mining/
  5. Data Mining Process. (n.d.). Data Mining for Managers. https://doi.org/10.1057/9781137406194.0011
  6. Data Mining Process. (n.d.). Data Mining for Managers. https://doi.org/10.1057/9781137406194.0011
  7. Mikut, R., & Reischl, M. (2011). Data Mining Tools. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(5), 431–443. https://doi.org/10.1002/widm.24
  8. Twin, A. (2023, January 20). What is data mining? how it works, benefits, techniques, and examples. Investopedia. Retrieved January 26, 2023, from https://www.investopedia.com/terms/d/datamining.asp
  9. Mikut, R., & Reischl, M. (2011). Data Mining Tools. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(5), 431–443. https://doi.org/10.1002/widm.24
  10. Is data mining illegal? Data Science Degree Programs Guide. (2022, September 26). Retrieved February 10, 2023, from https://www.datasciencedegreeprograms.net/faq/who-regulates-data-mining/#:~:text=While%20data%20mining%20itself%20is,must%20be%20mined%20with%20caution.
  11. Wikimedia Foundation. (2022, December 23) Social Media Mining. Wikipedia. Retrieved February 10, 2023, from https://en.wikipedia.org/wiki/Social_media_mining#:~:text=Social%20media%20mining%20is%20the,to%20users%20or%20conducting%20research.
  12. Twin, A. (2023, January 20). What is data mining? how it works, benefits, techniques, and examples. Investopedia. Retrieved January 26, 2023, from https://www.investopedia.com/terms/d/datamining.asp
  13. Pedersen, J. S., & Wilkinson, A. (2019). The promise, application and pitfalls of Big Data. Big Data, 1–12. https://doi.org/10.4337/9781788112352.00005
  14. van Wel, L., & Royakkers, L. (2004). Ethical issues in web data mining. Ethics and Information Technology, 6(2), 129–140. https://doi.org/10.1023/b:etin.0000047476.05912.3d
  15. Rande Price, R. V. P. – D. C. N. (2019, April 10). Consumers are unaware of many of google's data practices. Digital Content Next. Retrieved February 10, 2023, from https://digitalcontentnext.org/blog/2019/04/05/consumers-are-unaware-of-many-of-googles-data-practices/
  16. Staff, F. L. (2023, January 25). Is there a 'right to privacy' amendment? Findlaw. Retrieved February 10, 2023, from https://www.findlaw.com/injury/torts-and-personal-injuries/is-there-a-right-to-privacy-amendment.html#:~:text=The%20Fourth%20Amendment%20protects%20the,justifies%20protection%20of%20private%20information.
  17. Staff, the P. N. O., & Gaynor, A. (2022, February 11). Gramm-Leach-Bliley Act. Federal Trade Commission. Retrieved February 10, 2023, from https://www.ftc.gov/business-guidance/privacy-security/gramm-leach-bliley-act
  18. Staff, the P. N. O., & Gaynor, A. (2022, February 11). Gramm-Leach-Bliley Act. Federal Trade Commission. Retrieved February 10, 2023, from https://www.ftc.gov/business-guidance/privacy-security/gramm-leach-bliley-act
  19. Computer fraud and abuse act (CFAA). NACDL. (n.d.). Retrieved February 10, 2023, from https://www.nacdl.org/Landing/ComputerFraudandAbuseAct