Statistical Modeling

From SI410
Revision as of 21:28, 11 October 2012 by Jadraad (Talk | contribs)

Jump to: navigation, search
A model of global temperatures [1]

Statistical Modeling is the process by which events of interest are approximated using statistical methods and interpretation. The goal of statistics is to bring uncertainty and complexity into an organized form, allowing it to be quantified.[2] This can be very useful when trying to determine accuracy of collected data, or for approximating the course of an event which has not yet occurred. Though much of statistics can be performed without computers, technology plays a critical role in the ability to generate accurate and useful models for any amount of data.

Background

The main goal of statistical modeling is to be able to generalize to a population and expose the relationships between variables of interest. [3] This can be achieved in a number of way, using a variety of methods.

Generalized Approach

A typical research approach that can be applied to statistical modeling is the scientific method. It begins with a question and the desire to find an answer to the question. This can be formulated by an individual or group of researchers or by a company. It is important to formulate a strong hypothesis, as this will be the basis of the research being conducted. The next step is to decide what sort of data needs to be collected in order to help answer the question and provide enough evidence to accept or reject the hypothesis. The independent and dependent variables should be incorporated into the hypothesis to allow for a relationship between the variables (if any exists) to be exposed. The researcher then goes on to plan out a methodology for data collection. The methodology forms the basis of how the results may turn out, so the researcher focuses a large amount of time planning out an experiment that will be effective while also minimizing errors. The experiment is then executed and the data is tabulated. At this point, data can be quantified and an interpretation can be made.

The Hypothesis

The hypothesis is "an assumption about the population parameter."[4] It is based on background research and formulated accordingly. Usually, it will take the form of a null hypothesis and an alternative hypothesis, where the null implies that there is no significant statistical difference in the data and the alternative supports a significant difference. Statisticians will choose to either reject or accept the null hypothesis based on certain tests, which vary depending on the type of data collected.

Methods of Data Collection

There are two mains forms of data, qualitative and quantitative, and these can drastically affect the type of collection that needs to be done. Qualitative data takes the form of descriptions of what is being sampled, so naturally, the most common way of collecting qualitative data is observation. On the other hand, quantitative data involves numbers and is collected in a more mathematical method. [5]

Here are some common methods of data collection.

Methods Description
Census Examines entire population[6]
Sample Survey Examines a fraction of the population [6]
Experiment Controlled method of data collection, based on causal relationships between sample groups[6]
Observational Study Also examines causal relationships, but with less control over sample groups[6]

These methods all have their uses in specific contexts, and it is up to the researcher to decide which method will be most effective to serve their purpose.

Data Analysis

A chart to help determine what type of test to use [7]

Numerical Analysis

There are many ways of analyzing data numerically, with tests available for practically any type of data, grouping or sample. Some examples include using the mean, t-tests, ANOVA, or linear regression. The table at the right is very helpful for deciding what method of analysis to use. Some of the factors the determine which test is most appropriate include number of groups, paired or unpaired, as well as the specific type of data being analyzed. It is important to use the correct test as using an improper test would lead to incorrect results, and in some cases it is not possible to carry out the calculations. The overall goal of numerical analysis is to describe or compare one or more sets of data. Numerical analysis will typically yield a number which can be used along with values such as degrees of freedom and standard scores to make a conclusion about the hypothesis.

Graphical Analysis

Interpreting the Results

Uses

In Medicine

Recommender Systems

Recommender Systems are based on algorithms and statistical models which can be very complex. Their goal is to make suggestions to customers based on past search history or past purchased items. They can be found on retail sites like Amazon, music sites like Spotify, or even WebMD for determining what illness a person may be experiencing.


Ethical Concerns

Avoiding Bias

Misleading Statistics

See Also

Data Mining
Data Aggregation Online

References

  1. Cobb, Loren. The Causes of Global Warming: A Graphical Approach. 2007. Photograph. n.p. Web. 9 Oct 2012. <http://tqe.quaker.org/2007/TQE158-EN-GlobalWarming-V1.html>.
  2. Kaplan, Daniel. Statistical Modeling: A Fresh Approach. 2nd ed. Saint Paul: Project Mosaic, 2012. vii. Print.
  3. http://www.sportsci.org/resource/stats/models.html
  4. http://stattrek.com/hypothesis-test/hypothesis-testing.aspx
  5. http://www.regentsprep.org/Regents/math/ALGEBRA/AD1/qualquant.htm
  6. 6.0 6.1 6.2 6.3 http://stattrek.com/statistics/data-collection-methods.aspx
  7. http://twopaces.com/stats_help.html

External Links