Data Warehousing and Data Mining - Unit Wise Questions

Unit 1: Introduction to Data Warehousing
52 Questions

1. What are the key steps in knowledge discovery in databases? Explain.

10 marks | Asked in 2071 (II)

1. Differentiate between Data-Warehouse and Data-mining. Explain the stages of knowledge discovery in database with example.

10 marks | Asked in 2072

1. Differentiate between Data-Warehouse and Data-mining..

10 marks | Asked in 2073

1. Write down any one advantage and disadvantage of MOLAP over ROLAP. Define signed network and how do you check whether it is balanced or not? How beam search reduces the space complexity? Illustrate with an example.[2+4+4]

10 marks | Asked in 2078(New Course)

2.  Why do we need to preprocess the data before running the algorithm? What are the processes for this? Explain. Give some examples of noise that must be removed in data while extracting the pattern.

10 marks | Asked in 2074

2. Explain the functionalities and classification of data mining system with example.

10 marks | Asked in 2071 (II)

2. Explain the various data mining task primitives in detail.

10 marks | Asked in 2071

2. How concept hierarchy is used in extracting information? Generate the frequent pattern from the following data set FP growth, where minimum support = 3.[2+8]

10 marks | Asked in 2078(New Course)

3. Explain the architecture and implementation of data warehouse with example.

10 marks | Asked in 2069

3. How do you compare two classifiers? Given the points A(3,7), B(4,6), C(5,5), D(6,4), E(7,3), F(6,2), G(7, 2), and H(8,4), find the core points and outliers using DBSCAN. Take Eps = 2.5 and MinPts = 3. [2+8]

10 marks | Asked in 2078(New Course)

3. Explain the architecture of data mining system with schematic diagram.

10 marks | Asked in 2071

3. What kind of data preprocessing do we need before applying data mining algorithm to any data set. Explain minning method to handle noisy data with example.

10 marks | Asked in Model

4. What are the stages of knowledge discovery in database (KDD)?

5 marks | Asked in 2071 (II)

4. What are the basic stages of KDD?

5 marks | Asked in 2071

4. What is the purpose of cluster analysis in data mining? Explain.

5 marks | Asked in 2075

5. How does KDD differ with data mining? Describe the stages of data mining.

5 marks | Asked in 2075

2. "Data mining is a part of KDD", Do you agree or disagree? Justify. Explain the different stages in HDD.[3+7]

10 marks | Asked in 2078

5.  Describe the types of data used in data mining.

5 marks | Asked in 2074

4.How classification plays significance role in data mining? Explain.

5 marks | Asked in 2076

4.When a pattern is said to be interesting? List the issues of data mining. [1+4]

5 marks | Asked in 2078(New Course)

5. Are the information given by data mining is always useful? What are the issues in data warehousing and data mining?

5 marks | Asked in 2076

6. Differentiate between OLAP and OLTP.

5 marks | Asked in 2069

3. How data can be modeled in multidimensional data model? Explain the conceptual modeling of data warehouse.[4+6]

10 marks | Asked in 2078

5. Define data discretization. Describe the tasks for data preprocessing. [1+4]

5 marks | Asked in 2078(New Course)

7. Differentiate between KDD and Data Mining.

5 marks | Asked in 2073

6.Explain the four characteristics of data warehouse.

5 marks | Asked in 2076

7. Differentiate between KDD and Data Mining.

5 marks | Asked in 2072

6. Define spatial data mining. What are the challenged of multimedia mining? Describe with an example.[2+3]

5 marks | Asked in 2078(New Course)

7. Consider the following data set.

Find out whether the object with attribute Confident = Yes, Sick = No will Fail or Pass using Bayesian classification.[5]

5 marks | Asked in 2078(New Course)

4. In real-world data, tuples with missing values values for some attributes are a common occurrence. Describe various methods for handling problem. [5]

5 marks | Asked in 2078

8. What are the choices for data cube materialization? Explain the strategies for cube computation. [2+3]

5 marks | Asked in 2078(New Course)

5. Can we use operational database instead of data warehouse? List the nature of data warehouse.[1+4]

5 marks | Asked in 2078

9.Show the conflict between theory of balance and status. How do you improve Apriori? [2+3]

5 marks | Asked in 2078(New Course)

6. Why it is necessary to pre-compute the data cube? What are the possible issues for performing data cube computation.[3+2]

5 marks | Asked in 2078

10. Differentiate between star schema and snow flake schema. List any two methods for data normalization. [2+3]

5 marks | Asked in 2078(New Course)

7. Describe any three methods to normalize the group of data.[5]

5 marks | Asked in 2078

11. Differentiate between KDD and data mining.

5 marks | Asked in 2076

10. Describe genetic algorithm using as problem solving technique in data mining.

5 marks | Asked in Model

11. How do you evaluate the accuracy of a classifier? Discuss the advantages of using K- fold cross validation. [2+3]

5 marks | Asked in 2078(New Course)

13. Write short notes (Any Two)

     a) MOLAP

     b) Data cubes

     c) Snowflakes

     d) Regression

5 marks | Asked in 2071 (II)

8. What are the significances of association rules in data mining? List the types of association rules with examples.[2+3]

5 marks | Asked in 2078

13. Write short notes (Any Two)

     a) Stars

     b) HOLAP

     c) Data Specification

     d) Mining and world wide web (WWW)

5 marks | Asked in 2071

13. Write short notes (Any Two)

    a) HOLAP

    b) Hierarchy specification

    c) Spatial Database

5 marks | Asked in 2072

13. Write short notes (Any Two)

     a) Data cubes

     b) HOLAP

     c) Spatial Database

5 marks | Asked in 2073

12. Apply K(=2)- Means algorithm over the data (185, 72), (170, 56), (168, 60), (179, 68), (182, 72), (188, 77) up to two iterations and show the clusters. Initially choose first two objects as initial centroids.[5]

5 marks | Asked in 2078(New Course)

9. How do you index OLAP data? Give examples.[5]

5 marks | Asked in 2078

13. Write short notes (Any Two)

a) Text Database Mining

b) Back propagation Algorithm

c) Regression

d) HOLAP

5 marks | Asked in Model

13. Write short notes on (Any Two)

a. Evolution analysis

b. Decision trees

c. Text mining

d. Classification using Regression

5 marks | Asked in 2076

10. Apriori needs to scan the dataset a lot of time which reduces the efficiency. Explain some mechanism to improve its efficiency.[5]

5 marks | Asked in 2078

11. Differentiate between OLTP and OLAP. [5]

5 marks | Asked in 2078

12. Which one approach is better, hierarchical or partitioning for clustering? Justify. List some drawbacks of k-means.[2+3]

5 marks | Asked in 2078

13. Write short notes.(Any Two)

a. Outlier Analysis

b. Web Mining

c. Query Manager

d. Pros and Cons of Association rules

5 marks | Asked in 2078

Unit 2: Introduction to Data Mining
24 Questions

1. Explain the architecture of Data mining system with block diagram.

10 marks | Asked in 2069

2. Explain the DBMS vs. Data Warehouse.

10 marks | Asked in 2073

2. Do pattern and information refer to same aspect? Justify. Differentiate between data warehouse and operational database.

10 marks | Asked in 2076

1. Suppose that a data warehouse for Big University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg-grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg-grade measure stores the actual course grade of the student. At higher conceptual levels, avg-grade stores the average grade for the given combination.

a) Draw a snowflake schema diagram for the data warehouse.

b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each Big University Student.

c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?

10 marks | Asked in Model

3. Explain about the architecture and implementation of data warehouse with example.

10 marks | Asked in 2071 (II)

3. Explain the data warehouse architecture. Differentiate between distributed and virtual data warehouse.

10 marks | Asked in 2072

4. What do you mean by knowledge discovery in database (KDD)?

5 marks | Asked in 2069

4. Differentiate between Data marks and Meta data.

5 marks | Asked in 2073

4. Explain the multidimensional data model with example.

5 marks | Asked in 2072

5. List down the functionality of meta data.

5 marks | Asked in 2071 (II)

5. What do you mean by virtual data warehouse.

5 marks | Asked in 2073

5. Explain the application of data warehouse and data mining.

5 marks | Asked in 2069

5. Differentiate between DBMS and Data Warehouse.

5 marks | Asked in 2071

6. Explain the distributed and virtual data warehouse.

5 marks | Asked in 2071

6.  Explain the similarities and dissimilarities between operational database and data warehouse.

5 marks | Asked in 2074

7. Explain the multidimensional data model.

5 marks | Asked in 2071 (II)

5. Differentiate between data marts and data cubes.

5 marks | Asked in Model

7. Explain the data mining techniques.

5 marks | Asked in 2069

8. How different schema are used to model data warehouse? Explain.

5 marks | Asked in 2075

9.  Why data cube computation is essential task in data mining? Describe general strategy in data cube computation.

5 marks | Asked in 2074

8. How multidimensional data model helps in retrieving information? Explain with suitable example. 

5 marks | Asked in 2076

10.  Describe the different components of a data warehouse.

5 marks | Asked in 2074

11.  Define dimension table and fact table. What makes the necessity of multidimensional data model?

5 marks | Asked in 2074

12. What is DMQL? How do you define Star Schema using DMQL?

5 marks | Asked in Model

Unit 3: Data Preprocessing
10 Questions

2. Describe how bitmap and join indexing are used to represent OLAP data. Explain the different components of data warehouse.

10 marks | Asked in 2075

5. Differentiate between OLTP and OLAP.

5 marks | Asked in 2072

6. Differentiate between OLAP and OLTP.

5 marks | Asked in 2071 (II)

6. Explain OLAP operations with examples.

5 marks | Asked in 2075

7.  List the types of OLAP operations with example.

5 marks | Asked in 2074

6. Explain OLAP operations with example?

5 marks | Asked in Model

9. Compare the OLAP servers, ROLAP, MOLAP and HOLAP.

5 marks | Asked in 2076

11. Differentiate between OLTP and OLAP.

5 marks | Asked in 2071

12. Explain the data mining languages.

5 marks | Asked in 2069

13.  Write short notes on (any two):

            a)  Concept hierarchy

            b)  Data mining Query Language

            c)  Text mining

            d)  ROLAP vs MOLAP

5 marks | Asked in 2074

Unit 4: Data Cube Technology
8 Questions

6. Explain the tuning and testing of Data Warehouse.

5 marks | Asked in 2073

6. Explain the tuning and testing of Data Warehouse.

5 marks | Asked in 2072

8. List down the data mining tools.

5 marks | Asked in 2071 (II)

8. What are the data warehouse back end tools? Explain.

5 marks | Asked in 2071

7. Explain the optimization techniques in data cube computation.

5 marks | Asked in 2076

9. Describe the significances of pre-computation of data cube.

5 marks | Asked in 2075

11. What is data cube? Explain with example.

5 marks | Asked in 2069

12. What does data warehouse tuning mean? Describe the parameters.

5 marks | Asked in 2076

Unit 5: Mining Frequent Patterns
2 Questions

7. Explain the data cube with example.

5 marks | Asked in 2071

8. Explain the Apriori Algorithm.

5 marks | Asked in 2069

Unit 6: Classification and Prediction
6 Questions

1. Consider the following 14 training dataset assumed a credit risk of high, moderate or low to people based on the following properties of their credit rating:

a. Collateral with possible values { Adequate, None}

b. Income with possible values {"Rs 0K to Rs 15K","Rs 15 K to Rs 35K","Over Rs 35 K"}

c. Debt with possible values{ High, Low}

d. Credit history with possible values {Good, Bad, Unknown}

Classify the individual with credit history=unknown, debt  = low, collateral = adequate and income = Rs 15K to Rs 35K using decision tree algorithm. Use ID3 algorithm for building the decision tree.[10]


10 marks | Asked in 2078

4. List and describe the five primitives for specifying a data mining task.

5 marks | Asked in 2074 |

7. Explain the primitives of data mining query language.

5 marks | Asked in 2075

8. Explain the data mining query language with example.

5 marks | Asked in 2072

8. Explain the data mining query language.

5 marks | Asked in 2073

10. Give a syntax and example of data mining query language.
5 marks | Asked in 2076

Unit 7: Cluster Analysis
15 Questions

1. What do you mean by representative object based clustering technique? Explain in detail with example.

10 marks | Asked in 2071

1. Discuss the types of web mining. Explain why K-means is sensitive to outlier and how does K-Medoid minimize this issue.

10 marks | Asked in 2076

2. Define clustering. Explain with example of the partitioning and hierarchical clustering methods.

10 marks | Asked in 2069

2. What do you mean by clustering? Explain the K-Mean and K-Mediod algorithm with example.

10 marks | Asked in 2072

3.  List the two steps used in classification approach with its issues. Is this right decision to use neural network always as a classifier? Give your opinion. Discuss the working mechanism of back propagation classification algorithm.

10 marks | Asked in 2074

3. Explain the K-mean and K-Mediod Algorithm with example.

10 marks | Asked in 2073

8.  Illustrate the strength and weakness of k-mean in comparison with k-medoids algorithm.
5 marks | Asked in 2074

9. Explain the K-Mediod Algorithm.

5 marks | Asked in 2069

7. List the drawbacks of ID3 algorithm with over-fitting and its remedy techniques

5 marks | Asked in Model

8. Write the algorithm for K-means clustering. Compare it with k-nearest neighbor algorithm.

5 marks | Asked in Model

10. What are the types of Regression? Explain.

5 marks | Asked in 2072

10. Explain the types of Regression.

5 marks | Asked in 2073

10. What is the objective of K-means algorithm?

5 marks | Asked in 2071 (II)

12.  Discuss the approach behind Bayesian classification. Why smoothing technique is necessary in Bayesian classification?

5 marks | Asked in 2074

13. Write short notes (Any Two)

     a) OLAP queries

     b) Snow flakes

     c) K-mean

     d) Mining text databases

5 marks | Asked in 2069

Unit 8: Graph Mining and Social Network Analysis
11 Questions

1.  You are given the transaction data shown below from a fast food restaurant. There are 9 distinct transactions (order 1 to order 9). There are total 5 meal (M1 to M5) involved in transactions.

Meal ItemsList of item IDsMeal Items
List of item IDs

order 1

order 2

order 3

order 4

order 5

M1, M2, M5

M2, M4

M2, M3

M1, M2, M4

M1, M3

order 6

order 7

order 8

order 9

M2, M3

M1, M3

M1, M2, M3, M5

M1, M2, M3

Minimum support =2, Minimum confidence = 0,7

Apply the Apriori algorithm to the database to identify frequent k-itemset and find all strong association rules.

10 marks | Asked in 2074

3. List the problems of Apriori algorithm with its possible solutions. Consider the following transaction dataset.

Transaction_ID          Item_List

T1                                 {K, A, D, B}

T2                                 {D,A,C,E,B}

T3                                 {C,A,B,E}

T4                                 {B,A,D}

What association rules can be found in this set, if the minimum support is 3 and the minimum confidence is 80%.

10 marks | Asked in 2076

3. Give any two types of association rules with example. Trace the results of using the Apriori algorithm on the grocery store example with support threshold 2 and confidence threshold 60 %. Show the candidate and frequent itemsets for each database scan. Enumerate all the final frequent itemsets. Also indicate the association rules that are generated.

Transaction_IDItems
T1HotDogs, Buns, Ketchup
T2HotDogs, Buns
T3HotDogs, Coke, Chips
T4Chips, Coke
T5Chips, Ketchup
T6HotDogs, Coke, Chips
10 marks | Asked in 2075

2. A= {A1, A2, A3, A4, A5, A6}, Assume σ = 35%. Use Apriori algorithm to get the desired solution.


A1A2A3A4A5A6
000111
011100
100111
110100
101011
011101
000110
010101
100100
111111


10 marks | Asked in Model

4. Explain the use of frequent item set generation process.

5 marks | Asked in Model

9. Explain the Aprion Algorithm.

5 marks | Asked in 2073

9. What are the advantages and disadvantages of association rules?

5 marks | Asked in 2072

9. Write down the two measures of association rule.

5 marks | Asked in 2071 (II)

11. Explain the association rules with advantages and disadvantages.

5 marks | Asked in 2073

11. Explain the Apriori Algorithm.

5 marks | Asked in 2072

12. Explain the Apriori Algorithm.

5 marks | Asked in 2071

Unit 9: Mining Spatial, Multimedia, Text and Web Data
10 Questions

1. List some issues of multimedia mining. Describe how back propagation is used in classification.

10 marks | Asked in 2075

9. Explain the data mining tasks performed on a text database.

5 marks | Asked in 2071

10. Define the spatial database and its features.

5 marks | Asked in 2071

10. Define the spatial database and its features.

5 marks | Asked in 2069

11. Explain the application of spatial databases.

5 marks | Asked in 2071 (II)

9. What is text mining? Explain the text indexing techniques.

5 marks | Asked in Model

12. Explain mining text databases.

5 marks | Asked in 2073

12. Explain the application of mining used in WWW.

5 marks | Asked in 2072

12. Explain the methods of mining multimedia database.

5 marks | Asked in 2071 (II)

11. What do you mean by WWW mining? Explain WWW mining techniques.

5 marks | Asked in Model