Data Warehousing and Data Mining

This course introduces advanced aspects of data warehousing and data mining, encompassing the principles, research results and commercial application of the current technologies.

Course Content

Unit 1: Introduction

This unit includes following topics: Types of databases (Relational database, Data Warehouses, Transactional Database), Functionalities of data mining – What kinds of Pattern can be mined?, Association Analysis, Cluster Analysis, Outlier Analysis, Evolution Analysis, Stages of Knowledge discovery in database(KDD), Setting up a KDD environment, Issues in Data Warehouse and Data Mining, Application of Data Warehouse and Data Mining

Unit 2: Data Warehouse for Data mining

This unit includes following topics: Differences between operational database systems and data warehouses, Data Warehouse Architecture, Distributed and Virtual Data Warehouse, Data Warehouse Manager, Data marts, Metadata, Multidimensional data model, From Tables and Spread Sheets to Data Cubes, Star schema, Snowflake schema and Fact constellation schema

Unit 3: OLAP technology for Data Mining

This unit includes following topics: On-line analytical processing models and operations (drill down, drill up, slice, dice, pivot), Types of OLAP Servers: ROLAP versus MOLAP versus HOLAP, OLTP

Unit 4: Tuning for data warehouse

This unit includes following topics: Computation of Data Cubes, modeling, OLAP data, OLAP queries, Data Warehouse back end tools, Tuning and testing of Data Warehouse of Data Warehouse.

Unit 5: Data Mining techniques

This unit includes following topics: Data Mining definition and Task, KDD versus Data Mining, Data Mining techniques, tools and application

Unit 6: Data mining query languages

This unit includes following topics: Data mining query languages, Data specification, specifying knowledge, hierarchy specification, pattern presentation & visualization specification, Data mining languages and standardization of data mining

Unit 7: Association analysis

This unit includes following topics: Association Rule Mining, why Association Mining is necessary?, Pros and Cons of Association Rules, Apriori Algorithm

Unit 8: Cluster analysis, Classification and Predication

This unit includes following topics: What is classification? What is predication?, Issues regarding classification and prediction (Preparing the data for classification and prediction, Comparing classification methods), Classification by decision tree induction (Extracting classification rules from decision trees) Bayesian Classification, Classification by back propagation, Introduction to Regression (Types of Regression), Clustering Algorithm (K-mean and K-Mediod Algorithms)

Unit 9: Advanced concepts in data mining

This unit includes following topics: Mining Text Databases, Mining the World Wide Web, Mining Multimedia and Spatial Databases