Movie Recommendation System Using Content-Based Filtering
This repository hosts a content-based movie recommendation system built with Python. It uses metadata from a movie dataset to suggest similar movies based on features like genres, cast, crew, and keywords.
Model Details
- Model Type: Content-Based Recommendation System
- Technique Used: Cosine Similarity
- Libraries: Pandas, Scikit-learn, Numpy
- Dataset: TMDB 5000 Movie Dataset (or similar metadata-rich dataset)
- Task: Movie Recommendation based on content similarity
Usage
Installation
pip install pandas scikit-learn numpy
Running the Model
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
# Load dataset
movies = pd.read_csv('movies.csv')
# Combine relevant features into a single string
movies['combined_features'] = movies['genres'] + ' ' + movies['keywords'] + ' ' + movies['cast'] + ' ' + movies['crew']
# Vectorize features
vectorizer = CountVectorizer()
feature_vectors = vectorizer.fit_transform(movies['combined_features'])
# Compute similarity matrix
similarity = cosine_similarity(feature_vectors)
# Define recommendation function
def recommend(movie_name):
movie_index = movies[movies['title'] == movie_name].index[0]
distances = similarity[movie_index]
movie_list = sorted(list(enumerate(distances)), reverse=True, key=lambda x: x[1])[1:6]
for i in movie_list:
print(movies.iloc[i[0]].title)
# Example usage
recommend("Inception")
Performance Metrics
This is a heuristic model and doesn't have standard ML performance metrics like accuracy or F1. Evaluation is subjective based on relevance and user satisfaction.
Dataset Details
The dataset includes the following fields:
- Title
- Genres
- Keywords
- Cast
- Crew
Preprocessing includes:
- Removing nulls and duplicates
- Parsing nested JSON fields into readable text
- Combining features for vectorization
Repository Structure
.
βββ movies.csv # Dataset file
βββ recommendation_system.ipynb # Main notebook
βββ README.md # Documentation file
Limitations
- Not personalized; recommendations are the same for all users.
- Doesn't account for user ratings or feedback.
- Limited by the richness and correctness of metadata in the dataset.
Contributing
Suggestions and improvements are welcome! Feel free to open issues or pull requests to help improve this project.