Data mining

From SourceWatch
Jump to navigation Jump to search

Data mining, according to the hyperdictionary, is "Analysis of data in a database using tools which look for trends or anomalies without knowledge of the meaning of the data. Data mining was invented by IBM who hold some related patents."

How It Works

"Data mining is sorting through data to identify patterns and establish relationships. Data mining parameters include:

  • Association - looking for patterns where one event is connected to another event
  • Sequence or path analysis - looking for patterns where one event leads to another later event
  • Classification - looking for new patterns (May result in a change in the way the data is organized but that's ok)
  • Clustering - finding and visually documenting groups of facts not previously known
  • Forecasting - discovering patterns in data that can lead to reasonable predictions about the future

"Data mining techniques are used in mathematics, cybernetics, and genetics. Web mining, a type of data mining used in customer relationship management (CRM), takes advantage of the huge amount of information gathered by a Web site to look for patterns in user behavior."[1]

Another Definition

According to the Webopedia: Data mining is "A class of database applications that look for hidden patterns in a group of data that can be used to predict future behavior. For example, data mining software can help retail companies find customers with common interests. The term is commonly misused to describe software that presents data in new ways. True data mining software doesn't just change the presentation, but actually discovers previously unknown relationships among the data.

"Data mining is popular in the science and mathematical fields but also is utilized increasingly by marketers trying to distill useful consumer data from Web sites."

Used by Federal Agencies

The New York Times' Robert Pear reported on May 26, 2004, that a "Survey Finds U.S. Agencies Engaged in 'Data Mining'."

Pear writes that "A survey of federal agencies ... to be issued [May 27, 2004] by the General Accounting Office, an investigative arm of Congress ... has found more than 120 programs that collect and analyze large amounts of personal data [known as data mining] on individuals to predict their behavior."
"The practice, the GAO report found, "was ubiquitous." The GAO "found that 52 [agencies] were systematically sifting through computer databases. These agencies reported 199 data mining projects, of which 68 were planned and 131 were in operation. At least 122 of the 199 projects used identifying information like names, e-mail addresses, Social Security numbers and driver's license numbers."
Pear says that "The survey provides the first authoritative estimate of the extent of data mining by the government. It excludes most classified projects, so the actual numbers are likely to be much higher."
The report, Pear writes, reveals that the Defense Department "made greatest use of the technique, with 47 data mining projects to track everything from the academic performance of Navy midshipmen to the whereabouts of ship parts and suspected terrorists."
"Of the 199 data mining projects," he says, "54 use information from the private sector, like credit reports and records of credit card transactions. Seventy-seven projects use data obtained from other federal agencies, like student loan records, bank account numbers and taxpayer identification numbers."

Related SourceWatch Resources

External links