IFB214TC Data Mining Applications
Assessment Tasks
In the era of online streaming platforms, movie recommendation systems play a pivotal role in enhancing user experience. These systems rely on data mining techniques to analyze customer preferences and provide tailored movie suggestions. As a data analyst, you are tasked with exploring a movie recommendation dataset that includes customer IDs, movie IDs, and movie ratings. By employing Python and data mining techniques, you can uncover valuable insights that contribute to better movie recommendations and user satisfaction.
Your mission is to thoroughly analyze the provided movie recommendation dataset using Python and various data mining techniques. The dataset consists of customer IDs, movie IDs, and movie ratings, representing the interactions between customers and movies. For each task, you are required to provide clear explanations, Python code implementations, and relevant visualizations. Ensure to include the generated Python code screenshots in the report for reference.
Task 1: Customer Preferences and Ratings (10 Marks)
1. Identify and list movies that have the highest average ratings across all customers.
2. Detect customers who consistently rate movies positively or negatively.
Task 2: Movie Popularity and Ratings (15 Marks)
1. Determine the top 10 most popular movies based on the number of ratings they received.
2. Investigate whether there is a correlation between a movie's popularity (number of ratings) and its average rating.
Task 3: Outlier Detection and Anomalies (10 Marks)
Identify customers who consistently provide extreme ratings (e.g., always giving the lowest or highest ratings). Explore the potential impact of these outliers on the recommendation system.
Task 4: Clustering Analysis (20 Marks)
Apply the K-Means clustering algorithm to group customers based on their movie ratings. Use techniques
like Principal Component Analysis (PCA) for dimensionality reduction and visualize the resulting clusters
using scatter plots or other appropriate visualization methods.
Task 5: Apriori Algorithm for Association Rules (25 Marks)
1. Preprocess the dataset to prepare it for the Apriori algorithm.
2. Apply the Apriori algorithm to discover frequent itemsets, representing movies that are frequently rated
together.
3. Provide a sample of the discovered frequent itemsets and associated association rules.
4. Discuss the insights gained from the results and propose how these insights could be utilized to enhance
movie recommendations.
Overall Presentation and Code Quality (20 Marks)
Evaluate the clarity and coherence of your analysis, and ensure proper documentation and organization of
your code. Pay attention to code readability and efficiency.
Note: Each student will be allocated a unique xls file. The file will be uploaded in LMO in due course. The
sample data structure is shown in Table 1.