Twitter/X
🐦 Twitter/X serves over 500 million users globally, processing billions of tweets and interactions daily. This document outlines the comprehensive architecture that enables real-time social networking at massive scale with high availability.
High-Level Architecture
Core Components
1. Tweet Distribution System
Twitter's fanout architecture handles tweet delivery to millions of followers.
Fanout Strategy:
- Push Fanout (Normal Users): Tweets pushed to followers' timelines immediately
- Pull Fanout (Celebrities): Tweets fetched on-demand due to massive follower counts
- Hybrid Model: Combines both strategies based on follower count threshold (~1M followers)
Key Features:
- Write operations: ~500,000 tweets/second peak
- Read operations: ~600,000 timeline requests/second
- Fanout to millions of followers in <5 seconds
- Content filtering and safety checks
2. Timeline Service Architecture
Timeline Components:
- Home Timeline: Chronological + algorithmic ranking
- User Timeline: User's own tweets
- Mentions Timeline: Tweets mentioning the user
- List Timelines: Curated user lists
Ranking Signals:
- Tweet recency and engagement
- User relationship strength
- Content type preferences
- Spam and quality scores
3. Real-Time Search Architecture
Search Components:
- EarlyBird: Custom real-time search engine
- Inverted Index: Tweet content, hashtags, mentions
- Time-based Partitioning: Recent tweets prioritized
- Distributed Query Execution: Parallel search across partitions
Search Features:
- Real-time indexing (<10 seconds)
- Full-text search with operators
- Trending topics detection
- Spam and quality filtering
4. User Service
Key Features:
- OAuth 2.0 authentication
- Social graph storage (FlockDB)
- Follow/unfollow operations
- User verification system
- Privacy and security settings
5. Direct Message Service
DM Features:
- End-to-end encryption option
- Real-time message delivery
- Group conversations
- Media sharing (images, videos, GIFs)
- Read receipts and typing indicators
6. Notification Service
Notification Types:
- Engagement notifications (likes, retweets, replies)
- Social notifications (new followers, mentions)
- Direct message notifications
- Trending topic alerts
- Personalized recommendations
Data Storage Architecture
1. Manhattan (Distributed Key-Value Store)
Manhattan Use Cases:
- Tweet storage (tweet ID → tweet data)
- Direct messages
- User timelines
- Low-latency key-value operations
Features:
- Geo-replicated across data centers
- Strong consistency within datacenter
- Eventual consistency across regions
- Billions of operations per second
2. MySQL Clusters
MySQL Usage:
- User account data
- Tweet metadata
- Relationships and social graph
- Application configuration
Sharding Strategy:
- User ID-based sharding
- Horizontal scaling to 1000+ shards
- Read replicas for query distribution
3. Redis Cache Architecture
Redis Use Cases:
- Timeline caching (home, user, mentions)
- Session storage
- Real-time counters (likes, retweets)
- Rate limiting counters
- Recent notification cache
4. Cassandra
Cassandra Use Cases:
- Analytics and metrics
- Application logs
- Historical tweet archives
- Time-series data
5. Hadoop HDFS
HDFS Use Cases:
- Data warehousing
- Batch analytics processing
- Machine learning training data
- Long-term data archival
Scalability & Performance
1. Tweet Write Path
Write Optimization:
- Asynchronous fanout processing
- Batch timeline updates
- Parallel writes to multiple storage systems
- Write-through cache strategy
2. Timeline Read Path
Read Optimization:
- Multi-level caching strategy
- Prefetching popular content
- Partial timeline rendering
- Lazy loading of media
3. Horizontal Scaling
Scaling Strategies:
- Geo-distributed data centers
- Auto-scaling based on traffic patterns
- Service mesh for inter-service communication
- Database sharding by user ID
4. Caching Strategy
Cache Hierarchy:
- L1: CDN (static assets, profile images)
- L2: Edge cache (API responses)
- L3: Redis (timelines, sessions)
- L4: Database query cache
Real-Time Features
1. Live Streaming Architecture
Live Features:
- Periscope integration
- Real-time chat
- Live reactions and engagement
- Low-latency streaming (~3-5 seconds)
2. Trending Topics
Trending Algorithm:
- Real-time tweet velocity tracking
- Engagement-based scoring
- Spam and abuse filtering
- Geographic personalization
- Recency weighting
3. Real-Time Recommendations
Recommendation Types:
- Who to follow suggestions
- Tweet recommendations
- Topic suggestions
- Trending content
Machine Learning Infrastructure
1. ML Pipeline
ML Use Cases:
- Timeline ranking
- Content recommendations
- Spam detection
- Image/video classification
- Trend prediction
- Ad targeting
2. Content Safety & Moderation
Safety Features:
- Automated spam detection
- Abusive content filtering
- Sensitive media detection
- Misinformation labeling
- Human-in-the-loop review
Security Architecture
Security Measures:
- Authentication: OAuth 2.0, JWT tokens
- Encryption: TLS 1.3, AES-256 at rest
- DDoS Protection: Multi-layered defense
- API Security: Rate limiting, key rotation
- Account Security: MFA, login verification
Monitoring & Observability
Monitoring Metrics:
- System Metrics: CPU, memory, disk, network
- Application Metrics: Request latency, error rates
- Business Metrics: Tweet volume, user engagement
- Custom Metrics: Timeline generation time, fanout latency
Alerting:
- Critical Alerts: Service outages, data loss
- Warning Alerts: High latency, resource saturation
- Anomaly Alerts: Unusual traffic patterns
- SLA Monitoring: 99.9% uptime target
Infrastructure & DevOps
1. Multi-Cloud Architecture
Infrastructure Strategy:
- Multi-cloud approach (AWS, GCP)
- Hybrid cloud with on-premise data centers
- Global CDN presence
- Cost optimization across providers
2. Deployment Pipeline
Deployment Strategy:
- Continuous Integration/Continuous Deployment (CI/CD)
- Canary deployments for risk mitigation
- Blue-green deployments for zero downtime
- Automated rollback on failures
- Feature flags for controlled rollouts
3. Infrastructure as Code
IaC Components:
- Terraform for cloud resource provisioning
- Ansible for server configuration
- Kubernetes for container orchestration
- GitOps workflow for changes
4. Disaster Recovery
DR Metrics:
- RTO (Recovery Time Objective): < 1 hour
- RPO (Recovery Point Objective): < 5 minutes
- Data Backup: Multiple geographic locations
- Automated Failover: Cross-region redundancy
Performance Optimization
1. Timeline Generation Performance
Performance Techniques:
- Predictive prefetching based on user behavior
- Parallel data fetching from multiple sources
- Edge caching for frequently accessed timelines
- Progressive rendering for faster perceived load time
2. Media Optimization
Media Features:
- Automatic image compression (up to 85% size reduction)
- Multiple format support (WebP, AVIF, JPEG)
- Responsive images based on device
- Video transcoding for multiple bitrates
- Lazy loading for off-screen media
3. Database Query Optimization
Optimization Strategies:
- Strategic indexing on high-traffic queries
- Denormalization for read-heavy operations
- Connection pooling to reduce overhead
- Query result caching with TTL
- Database sharding for horizontal scaling
Content Delivery
1. CDN Architecture
CDN Features:
- Global edge network (100+ locations)
- Smart routing based on geography
- Cache hit ratio > 95%
- Image optimization and transformation
- Video streaming with adaptive bitrate
2. Asset Pipeline
Asset Optimization:
- JavaScript/CSS minification
- Module bundling and code splitting
- Brotli compression for text assets
- Image sprites for icons
- Content hashing for cache busting
Analytics & Business Intelligence
1. Analytics Pipeline
Analytics Use Cases:
- User engagement metrics
- Tweet performance analytics
- Revenue and business metrics
- A/B testing analysis
- Fraud detection
2. Key Performance Indicators
Target Metrics:
- DAU: 250+ million daily active users
- Tweets/Day: 500+ million tweets
- API Latency: P95 < 200ms
- Availability: 99.9% uptime SLA
Mobile Architecture
1. Mobile App Architecture
Mobile Features:
- Offline timeline caching
- Background tweet synchronization
- Image/video compression before upload
- Progressive image loading
- Battery and data optimization
2. Push Notification System
Notification Strategy:
- Intelligent notification batching
- User preference-based filtering
- Quiet hours and do-not-disturb
- Rich notifications with media
- Deep linking to relevant content
Cost Optimization
Optimization Strategies
Cost Reduction Tactics:
- Reserved instances for stable workloads (30-50% savings)
- Spot instances for batch processing (up to 90% savings)
- Aggressive caching to reduce compute load
- Data compression and deduplication
- Multi-cloud strategy for competitive pricing
API Architecture
1. REST API
API Features:
- RESTful design principles
- OAuth 2.0 authentication
- Rate limiting (per endpoint, per user)
- Webhook support for real-time updates
- Comprehensive error handling
2. GraphQL API
GraphQL Benefits:
- Flexible data fetching (request only needed fields)
- Single request for multiple resources
- Strong typing and schema validation
- Efficient for mobile clients (reduced bandwidth)
Ads Platform Architecture
Ad Features:
- Promoted tweets
- Promoted accounts
- Promoted trends
- Real-time bidding (RTB)
- Sophisticated targeting (demographics, interests, behaviors)
- Performance analytics and reporting
Future Architecture Evolution
Emerging Technologies
Scalability Roadmap
- User Growth: Support 1 billion+ users
- Real-time Processing: Sub-second global propagation
- AI Integration: Smarter recommendations and moderation
- New Content Types: Audio, long-form, video
- Global Expansion: Low-latency access worldwide
Conclusion
Twitter/X's architecture demonstrates expertise in building ultra-scalable, real-time social networking platforms. The combination of intelligent caching, efficient fanout mechanisms, robust data storage, and advanced machine learning enables Twitter to handle billions of interactions daily while maintaining sub-second response times.
Key architectural principles:
- Real-time First: Optimized for immediate content distribution
- Horizontal Scalability: Services scale independently
- Data Locality: Cache and store data near users
- Fault Tolerance: Graceful degradation and quick recovery
- Continuous Evolution: Adapting to new technologies and user needs
The platform continues to evolve, incorporating new features and optimizations to meet growing demands while maintaining the speed and reliability users expect from a real-time social network.
This architecture represents Twitter/X's known systems and best practices. Actual implementation details may vary as the platform continues to evolve.