Data Mining

From SI410
Revision as of 06:50, 16 December 2011 by Guo (Talk | contribs)

Jump to: navigation, search

Data mining is the act of analyzing data from various perspectives and summarizing it into useful information, which combines aspects of artificial intelligence, machine learning, statistics and database systems. Software is implemented as one of many analytical tools used to analyze data. Through data mining, data is presented to users from many different angles, in various categories and relationships. On a more technical term, data mining is the process of realizing correlations or patterns among large fields of relative databases.[1]

Example

  • A simple example of data mining is analyzing the population of the University of Michigan, and make inferences and correlations about the resulting information, such as categorizing students into various ethnic backgrounds.
  • Before the term data mining came to use, many business corporations have already implemented its technology. They used high tech computers to comb through quantitative data from supermarket scanners, and analyzed the resulting data for market research purposes. This process have been immensely increasing the precision of analysis, and at the same time decreasing the cost of research.[2]

Process

Data mining can only occur on a dataset large enough to contain patterns to discover. This data must be aggregated, and stored in a database. Data cleaning then takes place, to remove noisy or partially missing entries.

Data mining consists of six sub-tasks:[3]

  • Anomaly detection: Identifying unusual records that may be anomalies or errors.
  • Association rule learning: Searching for general relationships between variables.
  • Clustering: Detection groups or structures within the data that are similar.
  • Classification: Applying known structures to new data.
  • Regression: Finding a function to model the data with the least error.
  • Summarization: Providing context and reporting findings.

Applications

Data mining is commonly used in business to determine what demographics are buying what products and to try to predict customer decisions, and science to find patterns in experimental data.

Ethical Implications

Data mining is the development of models of accumulated data. Sometimes, in an attempt to build an accurate statistical model, data miners tend to pry into private information in personal data records. While data mining itself is not inherently an ethical process, it has many applications that are ethically charged.

Particularly in mining social networking sites, a lot of personal information can be, and often is, accrued about an individual. Facebook uses these techniques to sell advertisers very specific target audiences. [4] As data mining has useful applications within the medical field, patient records could also be accessed in such a way. This raises issues about patient confidentiality and breach of privacy with regard to ordinarily private areas of people's personal life.

In systems that provide data from humans for such applications, maintaining anonymity of data and informing those involved of exactly what will happen to their data and how it will be used and allowing them to opt out of the process is a good way to keep such processes ethical.

References

  1. Palace, Bill. "What Is Data Mining?" Data Mining. Anderson Graduate School of Management at UCLA, Mar. 1996. Web. 16 Dec. 2011 <http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/index.htm>.
  2. Palace, Bill. "What Is Data Mining?" Data Mining. Anderson Graduate School of Management at UCLA, Mar. 1996. Web. 16 Dec. 2011 <http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/index.htm>.
  3. http://www.kdnuggets.com/gpspubs/aimag-kdd-overview-1996-Fayyad.pdf
  4. http://www.facebook.com/advertising/

See also