StudyNp | CSIT Seventh Semester – 2081 Question | Data Warehousing and Data Mining (Syllabus Wise)

Bachelors Level/Fourth Year/Seventh Semester/Science csit/seventh semester/data warehousing and data mining/syllabus wise questions

B.Sc Computer Science and Information Technology

Institute of Science and Technology, TU

Data Warehousing and Data Mining (CSC420)

Year Asked: 2081, syllabus wise question

Classification and Prediction

Define overfitting and under fitting. Train the decision tree classifier using the ID3 algorithm based on the following training data.

\begin{array}{c|c|c|c|c} \text{TID} & \text{Age} & \text{Car} & \text{Type} & \text{Class} \\ \hline 1 & \leq 30 & \text{Family} & \text{High} \\ 2 & \leq 30 & \text{Sports} & \text{High} \\ 3 & >30 & \text{Sports} & \text{High} \\ 4 & >30 & \text{Family} & \text{Low} \\ 5 & >30 & \text{Truck} & \text{Low} \\ 6 & \leq 30 & \text{Family} & \text{High} \\ \end{array}

[10]

What is support vector? How do you evaluate the accuracy of a classifier? Describe. [5]

Cluster Analysis

Using k-means++ algorithm and Euclidean distance, find the initial 3 cluster centroids from A1 = (3, 11), A2 = (3, 6), A3 = (9, 5), A4 = (6, 9), A6 = (7, 5), A7 = (2, 3), A8 = (5, 10). Choose (3, 11) as one of the initial centroids. [5]

Differentiate between k-means and k-medoids clustering algorithm. [5]

Data Cube Technology

Explain the general strategies for cube computation. [5]

List any two OLAP operations with example. How do you compute rule coverage and rule accuracy? [5]

Data Preprocessing

Describe any two methods of handling noisy data. [5]

Graph Mining and Social Network Analysis

Define graph mining. Discuss the conflict between theory of balance and theory of status. [5]

Define link mining. What are the roles of epsilon and MinPts in DBSCAN. [5]

Introduction to Data Mining

Distinguish between data characterization and data discrimination. What are the challenges of multimedia mining? [5]

Introduction to Data Warehousing

When do we prefer trim mean for statistical description of data? Justify with an example. Describe about multi-dimensional data model and conceptual modeling of data warehouse. [10]

Mining Frequent Patterns

How do you generate strong association rules? From the following dataset find the frequent item set using FP growth algorithm using 3 as minimum support.

\begin{array}{c|c} \text{Transaction ID} & \text{Items} \\ \hline \text{T1} & \{K, E, M, O, Y\} \\ \text{T2} & \{K, E, O, Y\} \\ \text{T3} & \{K, E, M\} \\ \text{T4} & \{K, M, Y\} \\ \text{T5} & \{K, E, O\} \\ \end{array}

[10]