Movie Review Sentiment Analysis using Naïve Bayes

Understanding Naive Bayes for Sentiment Analysis

1. Introduction

Sentiment analysis is a natural language processing (NLP) technique used to determine whether a given piece of text expresses a positive, negative, or neutral sentiment. It is widely applied in various domains such as customer feedback analysis, social media monitoring, and market research.

2. Importance of Sentiment Analysis
  • Enables businesses to understand customer emotions and opinions.
  • Automates the analysis of large volumes of text data.
  • Improves decision-making by identifying trends and customer satisfaction levels.
  • Used in recommendation systems, brand monitoring, and product reviews.
3. What is Naive Bayes?

Naive Bayes is a supervised learning algorithm based on Bayes' Theorem. It is particularly effective for text classification tasks like spam detection and sentiment analysis.

3.1 Bayes' Theorem

Bayes' Theorem describes the probability of an event occurring based on prior knowledge of related conditions.

Formula:
P(A|B) = [P(B|A) × P(A)] / P(B)

  • P(A|B) → Posterior probability
  • P(B|A) → Likelihood
  • P(A) → Prior probability of class
  • P(B) → Evidence
3.2 Why is it Called "Naive"?

The algorithm assumes that all features (words in a review) are conditionally independent given the class label — a "naive" assumption that is rarely true in language, yet the classifier still performs remarkably well.

3.3 Mathematical Formulation of Naive Bayes for Sentiment Analysis

Given a document X with words w₁, w₂, …, wₙ, we want to find the most probable class C:

P(C|X) = [P(X|C) × P(C)] / P(X)

Since P(X) is constant, we maximize:

P(C|X) ∝ P(C) × P(X|C)

Under the naive independence assumption:

P(X|C) = Π(i=1 to n) P(w_i|C)

Final scoring function:

P(C|X) ∝ P(C) × Π(i=1 to n) P(w_i|C)

Laplace (Add-One) Smoothing

To avoid zero probabilities for unseen words:

P(w_i|C) = (count(w_i, C) + 1) / (total words in class C + |V|)

where |V| = vocabulary size.

Log Probabilities (for Numerical Stability)

Instead of multiplying tiny numbers, we use log:

log P(C|X) = log P(C) + Σ(i=1 to n) log P(w_i|C)

We pick the class with the highest log score.

4. Steps in Sentiment Analysis Using Naive Bayes
Step 4: Train the Naive Bayes Model
  • Prior probability:
    **`P(C) = (number of documents in class C) / (total number of documents)`**
  • Likelihoods computed using Laplace smoothing (formula above)

(All other steps remain the same — only formulas are highlighted)

5. Advantages of Naive Bayes in Sentiment Analysis
  • Very fast training and prediction
  • Works well with high-dimensional text data
  • Surprisingly effective despite strong independence assumption
  • Requires relatively little training data
6. Limitations of Naive Bayes
  • Feature independence assumption is violated in real language
  • Poor at capturing sarcasm, negation ("not good"), and context
  • Zero probability problem if a word wasn't seen in training (mitigated by smoothing)
7. Key Takeaways
  • Naive Bayes is a simple, fast, and surprisingly powerful baseline for sentiment analysis
  • Text preprocessing significantly impacts performance
  • Laplace smoothing and log probabilities are essential practical tricks
  • Always evaluate using proper metrics on unseen data