https://github.com/brayvid/tweet-sentiment-classifier
My Flatiron School Data Science Bootcamp Phase 3 Project was to address the business problem of brand reputation management by monitoring and analyzing Twitter sentiment. The goal was to develop a machine learning model that can correctly classify tweets as positive, negative, or neutral, and provide insights to improve brand perception and engagement strategies.
View the slides from my presentation here.
Business Problem
Brand reputation management: Monitor brand perception by correctly classifying new tweets as positive, negative or neutral.
- Product Improvement: Analyze negative feedback for insights into product weaknesses.
- Collaboration Opportunities: Identify accounts with consistent positive sentiment for potential collaborations.
- Product Launch Strategy: Time new product launches during periods of high positive sentiment.
Dataset
The training dataset, sourced from Kaggle, includes:
- Labels: positive, negative or neutral in the ‘sentiment’ column.
- 27,000 tweets in the ‘text’ column.
- The ‘selected_text’ column contains the substring of each tweet relevant to classification.
Results
I tried several model types, and a Support Vector Classifier (SVC) applied to ‘selected_text’ yielded the best performance. Test set results are summarized below, with precision and recall scores per class and a confusion matrix. Test accuracy was 83%.
Label | Precision | Recall |
---|---|---|
Negative | 83% | 77% |
Neutral | 78% | 91% |
Positive | 93% | 80% |
Next Steps
- Semantic Embedding: Experiment with Word2Vec instead of frequency-based TF-IDF for better context understanding.
- Dimensionality Reduction: Investigate UMAP or t-SNE to improve model performance.
- Real-time Classification: Deploy the model as a web service for real-time tweet classification.
This project highlights the importance of sentiment analysis in brand reputation management and provides a foundation for further development and deployment in a real-world setting.