Data Mining

From SI410
Revision as of 18:23, 16 December 2011 by Robmolly (Talk | contribs)

Jump to: navigation, search

(back to index)

This illustration shows the role Data Mining plays in when processing information for business use


Data mining is the act of analyzing data from various perspectives and summarizing it into useful information, which combines aspects of artificial intelligence, machine learning, statistics and database systems. Software is implemented as one of many analytical tools used to analyze data. Through data mining, data is presented to users from many different angles, in various categories and relationships. On a more technical term, data mining is the process of realizing correlations or patterns among large fields of relative databases.[1]

Process

Data mining can only occur on a dataset large enough to contain patterns to discover. This data must be aggregated, and stored in a database. Data cleaning then takes place, to remove noisy or partially missing entries.

Data mining consists of six sub-tasks:[2]

Anomaly Detection

Identifying unusual records that may be anomalies or errors.

Association Rule Learning

Searching for general relationships between variables.

Clustering

Detection groups or structures within the data that are similar.

Classification

Applying known structures to new data.

Regression

Finding a function to model the data with the least error.

Summarization

Providing context and reporting findings.

Applications

Data mining is commonly used in business to determine what demographics are buying what products and to try to predict customer decisions, and science to find patterns in experimental data.

Examples

A simple example of data mining is analyzing a large population, such as University of Michigan students, and determining simple characteristics that the data has, such as the proportion of the student body that is from each ethnic background.

Before the term "data mining" came into popular use, many businesses had already implemented its technology. They used powerful computers to comb through quantitative data from supermarket scanners, and analyzed the resulting data for market research purposes. This process have been immensely increasing the precision of analysis, and at the same time decreasing the cost of research.[3]

Ethical Implications

Data mining is the development of models of accumulated data. Sometimes, in an attempt to build an accurate statistical model, data miners tend to pry into private information in personal data records. While data mining itself is not inherently an ethical process, it has many applications that are ethically charged.

Particularly in mining social networking sites, a lot of personal information can be, and often is, accrued about an individual. Facebook uses these techniques to sell advertisers very specific target audiences. [4] As data mining has useful applications within the medical field, patient records could also be accessed in such a way. This raises issues about patient confidentiality and breach of privacy with regard to ordinarily private areas of people's personal life.

In systems that provide data from humans for such applications, maintaining anonymity of data and informing those involved of exactly what will happen to their data and how it will be used and allowing them to opt out of the process is a good way to keep such processes ethical.

References

  1. Palace, Bill. "What Is Data Mining?" Data Mining. Anderson Graduate School of Management at UCLA, Mar. 1996. Web. 16 Dec. 2011 <http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/index.htm>.
  2. http://www.kdnuggets.com/gpspubs/aimag-kdd-overview-1996-Fayyad.pdf
  3. Palace, Bill. "What Is Data Mining?" Data Mining. Anderson Graduate School of Management at UCLA, Mar. 1996. Web. 16 Dec. 2011 <http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/index.htm>.
  4. http://www.facebook.com/advertising/

See also