Data Mining is an interesting course I took this past semester, and really enjoyed. Today I want to talk about it a little, what it actually is, and how it's done. Basically data mining brings together subjects like machine learning, statistical analysis and basic programming in order to analyze data and obtain information from what is called a data set. A data set can be a spreadsheet, a database, an ASCII file or really any source of data, which is gathered into a single form, which is called a data warehouse. It can contain data that's even been typed from hard copies or handwritten files. Basically someone gathers all the data needed about something, like records of different types of glass made, or blood samples of people and turns them into a data warehouse. The data then goes through a process called filtering and preprocessing which removes the useless or even noisy data. Useless data is essentially data that could have some details missing, or misplaced, duplicated or even be an outlier. These are all determined by many different statistical methods and measures. Then the data goes on to being processed by algorithms which create a "model" in the end. After that, the data is evaluated to measure whether the accuracy of said model and to determine if there's need for adjustments, or even use of another algorithm. There are many algorithms and measures for both model creation and evaluation.