Movie-Recommendation / README_movie_recommender.md
gautamnancy's picture
Upload 10 files
9bcb922 verified

Movie Recommendation System Using Content-Based Filtering

This repository hosts a content-based movie recommendation system built with Python. It uses metadata from a movie dataset to suggest similar movies based on features like genres, cast, crew, and keywords.


Model Details

  • Model Type: Content-Based Recommendation System
  • Technique Used: Cosine Similarity
  • Libraries: Pandas, Scikit-learn, Numpy
  • Dataset: TMDB 5000 Movie Dataset (or similar metadata-rich dataset)
  • Task: Movie Recommendation based on content similarity

Usage

Installation

pip install pandas scikit-learn numpy

Running the Model

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Load dataset
movies = pd.read_csv('movies.csv')

# Combine relevant features into a single string
movies['combined_features'] = movies['genres'] + ' ' + movies['keywords'] + ' ' + movies['cast'] + ' ' + movies['crew']

# Vectorize features
vectorizer = CountVectorizer()
feature_vectors = vectorizer.fit_transform(movies['combined_features'])

# Compute similarity matrix
similarity = cosine_similarity(feature_vectors)

# Define recommendation function
def recommend(movie_name):
    movie_index = movies[movies['title'] == movie_name].index[0]
    distances = similarity[movie_index]
    movie_list = sorted(list(enumerate(distances)), reverse=True, key=lambda x: x[1])[1:6]
    for i in movie_list:
        print(movies.iloc[i[0]].title)
        
# Example usage
recommend("Inception")

Performance Metrics

This is a heuristic model and doesn't have standard ML performance metrics like accuracy or F1. Evaluation is subjective based on relevance and user satisfaction.


Dataset Details

The dataset includes the following fields:

  • Title
  • Genres
  • Keywords
  • Cast
  • Crew

Preprocessing includes:

  • Removing nulls and duplicates
  • Parsing nested JSON fields into readable text
  • Combining features for vectorization

Repository Structure

.
β”œβ”€β”€ movies.csv                  # Dataset file
β”œβ”€β”€ recommendation_system.ipynb  # Main notebook
β”œβ”€β”€ README.md                   # Documentation file

Limitations

  • Not personalized; recommendations are the same for all users.
  • Doesn't account for user ratings or feedback.
  • Limited by the richness and correctness of metadata in the dataset.

Contributing

Suggestions and improvements are welcome! Feel free to open issues or pull requests to help improve this project.