Data Aggregation Online

From SI410
Revision as of 07:55, 12 December 2012 by Jfrankl (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Back • ↑Topics • ↑Categories

Personal data can be collected with simple code

Data Aggregation is the gathering of information about any particular topic, specifically that which is scraped from online sources. This information can then be stored, analyzed, and used by way of statistical methods. The first instances of data aggregation came in the forms of surveys, polls, interviews and public data pulls. As technology developed, data aggregation has improved tremendously with the help of the internet. All internet users leave bits of their information floating around the internet in many forms: cookies, sessions, user IDs, forum posts, and social networking information. With the right technology, a savvy user can aggregate this data, analyze it, and then implement it in business sectors such as marketing, advertising, search engine optimization, and even usability.


Data aggregation is any gathering of information. Purposes of data aggregation range from large-scale projects like compiling demographic information about specific populations to very small scale ones such as calculating a database's average boot-up time. Anything that can be statistically analyzed can be aggregated[1]. Traditionally, gathering information about people meant they had to be directly surveyed. There was no way of gathering certain demographic data such as age, income, marital status etc, unless methods were taken to get it through the government. With the rise of the Internet and social media sites, more and more people are willing to put their information out in the public domain. Here, data miners and aggregators can gather all of one's information in one fell swoop. The data that once took much time and effort to gather now only takes but a few moments. Moreover, any type of institution can do this easily-- they don't have to hire an outside statistic company to do the work.

What is Being Aggregated


Google pic.png

Google keeps track of every web page the search engine has ever landed on, as well as every click any user has ever made.[2] In order for Google to gather more relevant information from their users, Google started to provide other services that required users to login. By getting different users to login, they are able to get their hands on users data and link information on Google Applications, Gmail and search engines together. The algorithm is then used to group people with similar interests together. Now, simple features such as where people are logging in and what web browser they are using can be tracked. Every search query is logged, which has the potential to reveal a lot about a users personality. The eventual goal of Google is to create an algorithm out of information they find in their data and give Google the power to answer more hypothetical questions. Google also uses this data to better target their advertisements to users (through AdWords) depending on current searches, previous searches, email contents, and other such metrics.



While Google uses their algorithm to sift through a users data and find more information about them, Facebook is able to make the user tell the website basic information, such as where they are from, their age, and gender. "Facebook Everywhere" was created and announced on April 21, 2010. Facebook made it possible to press a 'like' button on an ad somewhere else on the web and collect that information about the user and about the ad. Also Facebook uses information such as relationships among people for example, to compete with Google and to provide the most relevant advertisements for certain users.

We purposefully leave pieces of our identity across the Internet almost without hesitation because it is so common. Nowadays, we are able to be tracked by cookies on our computers that track which websites are viewed and what items we look at, in addition to our usernames and actual names. This data can be aggregated to create a user profile that consists of our interests. This information can then be sold to companies that want to advertise to us or just be kept as statistics, either way many people are not aware that this data is trackable and becoming more and more of an accurate portrayal for users. furthermore it has been uttered but unproven thus far that Facebook has sold information because all the data that have aggregated over the years has the power to give Facebook many financial advantages with other companies.

Pros, Cons and Their Ethical Implications

Pros of Online Data Aggregation

Each time a user signs up for a website they have to remember another username, password and/or PIN. Companies are now using data aggregation to consolidate all of this data (from banks, airlines, e-mail accounts, and various reward programs) so that users can access all their information in one convenient place. There is also the possibility to have online bill pay and stock tracking in the same place as well [3]. This becomes more and more useful as the average user signs up for more sites and as traditionally non-Internet services (such as banking and financial services) become the norm online. This also makes the host sight an attractive place navigate to while online - this is what attracts businesses to follow this path. The potential ethical problem arises in telling one site all of a person's personal data. Although seeing bill pay, bank statements and e-mail all in the same place is convenient, it means if one password is cracked, hackers have access to everything as opposed to just one thing. The user would have to give access to their personal data to third party site - thus meaning that more than just the user has the ability to access their data.

There is also the argument that the data aggregation of public information saves businesses and researchers time and money and because they are using public data, they have full rights to use it. Anyone has access to it, and gaining access is not illegal. Seeing a piece of information as a data aggregation program is the same as a friend seeing it online - where the friend can also pull information from other users - so there is not a big difference between what a friend does with the information and what an aggregation program does with it. This pro could also be seen as a con when analyzing the ethical implications. Phone books have been around for a few decades where everyone with a home phone is included so anyone else in the area can look up a phone number and address. Phone books could be opted out of and were limited to a small geographic area. When this information is put online, there is no guarantee the data is ever gone if the user wants to opt out and their information can be looked up by anyone with an internet connection. The user loses control of their personal data when it is transferred to the online world.

Cons of Online Data Aggregation

As previously mentioned, users of the Internet knowingly and unknowingly leave pieces of themselves across multiple sites, but data aggregates have the ability to combine all of this information if it is public and/or there is a clause in the terms of agreement section that informs the user that their data could be sold or given away to another company. In essence, a 3rd party website that you have never heard of (let alone signed up for) could create a profile for you with your name, family members, (pulled from having them confirmed on Facebook), address and home phone number, ( - public information) your age and birthday (from signing up to get a free surprise on your birthday from another site), plus data on your interests (from tracking cookies). This profile is exactly what data aggregation can be (gathered information about someone) but when all the pieces of data are put together, it becomes glaringly obvious how much is shared online and how invasive it can feel. This can completely eliminate any feel of control over your personal data when it comes to the Internet - showing even if you monitor who you let in to your social networking circle, that does not mean the information you keep there does not get out. Ethical implications with this example follow those of the phone book example in the previous paragraph. Users lose the ability to control who has their data as sites take their public data and multiple digital copies of it are made. There is no way to be certain all of your public data has been erased from the Internet (or erased in general, not just technological mediums) if a user wants to remove their public data or opt-out of being in the 21st century phone book.

Another con of online data aggregation is what can be done with the information that has been pulled from various sources and placed in one spot. An instance of this comes from George Mason University grad student, Sean Gorman. Gorman's thesis used data aggregation to create a map of the United State's entire fiber-optic grid and where each business and industry connected to it. This left him with the ability to see where all the major hubs were in the United States and which place would cause the most damage if he took an axe and cut through the fiber-optics. All his data came from public sources thus he did nothing illegal to obtain it, and anyone who wanted to make the same map could do the same. The United States government, however, saw it as a terrorist threat, and threatened to not allow the dissertation to be published - for if it was, anyone could take that axe and cause major problems to the United States businesses and economy [4]. This illustrates the potential dangers of allowing data aggregation. Even though all the data used was public information, when all the pieces are put together they can pose a major threat to the country and this in itself is an ethical problem.

See Also

External Links


  3. Role of the internet

(back to index)