Bachelors Level/Fourth Year/Seventh Semester/Science csit/seventh semester/data warehousing and data mining/syllabus wise questions

B.Sc Computer Science and Information Technology

Institute of Science and Technology, TU

Data Warehousing and Data Mining (CSC420)

Year Asked: 2081, syllabus wise question

Classification and Prediction
1.
Define overfitting and under fitting. Train the decision tree classifier using the ID3 algorithm based on the following training data.

$\begin{array}{c|c|c|c|c} \text{TID} & \text{Age} & \text{Car} & \text{Type} & \text{Class} \\ \hline 1 & \leq 30 & \text{Family} & \text{High} \\ 2 & \leq 30 & \text{Sports} & \text{High} \\ 3 & >30 & \text{Sports} & \text{High} \\ 4 & >30 & \text{Family} & \text{Low} \\ 5 & >30 & \text{Truck} & \text{Low} \\ 6 & \leq 30 & \text{Family} & \text{High} \\ \end{array}$
[10]
2.
What is support vector? How do you evaluate the accuracy of a classifier? Describe. [5]
Cluster Analysis
1.
Using k-means++ algorithm and Euclidean distance, find the initial 3 cluster centroids from A1 = (3, 11), A2 = (3, 6), A3 = (9, 5), A4 = (6, 9), A6 = (7, 5), A7 = (2, 3), A8 = (5, 10). Choose (3, 11) as one of the initial centroids. [5]
2.
Differentiate between k-means and k-medoids clustering algorithm. [5]
Data Cube Technology
1.
Explain the general strategies for cube computation. [5]
2.
List any two OLAP operations with example. How do you compute rule coverage and rule accuracy? [5]
Data Preprocessing
1.
Describe any two methods of handling noisy data. [5]
Graph Mining and Social Network Analysis
1.
Define graph mining. Discuss the conflict between theory of balance and theory of status. [5]
2.
Define link mining. What are the roles of epsilon and MinPts in DBSCAN. [5]
Introduction to Data Mining
1.
Distinguish between data characterization and data discrimination. What are the challenges of multimedia mining? [5]
Introduction to Data Warehousing
1.
When do we prefer trim mean for statistical description of data? Justify with an example. Describe about multi-dimensional data model and conceptual modeling of data warehouse. [10]
Mining Frequent Patterns
1.
How do you generate strong association rules? From the following dataset find the frequent item set using FP growth algorithm using 3 as minimum support.

$\begin{array}{c|c} \text{Transaction ID} & \text{Items} \\ \hline \text{T1} & \{K, E, M, O, Y\} \\ \text{T2} & \{K, E, O, Y\} \\ \text{T3} & \{K, E, M\} \\ \text{T4} & \{K, M, Y\} \\ \text{T5} & \{K, E, O\} \\ \end{array}$
[10]