LogoMasst Docs

Twitter/X

🐦 Twitter/X serves over 500 million users globally, processing billions of tweets and interactions daily. This document outlines the comprehensive architecture that enables real-time social networking at massive scale with high availability.

High-Level Architecture

Core Components

1. Tweet Distribution System

Twitter's fanout architecture handles tweet delivery to millions of followers.

Fanout Strategy:

  • Push Fanout (Normal Users): Tweets pushed to followers' timelines immediately
  • Pull Fanout (Celebrities): Tweets fetched on-demand due to massive follower counts
  • Hybrid Model: Combines both strategies based on follower count threshold (~1M followers)

Key Features:

  • Write operations: ~500,000 tweets/second peak
  • Read operations: ~600,000 timeline requests/second
  • Fanout to millions of followers in <5 seconds
  • Content filtering and safety checks

2. Timeline Service Architecture

Timeline Components:

  • Home Timeline: Chronological + algorithmic ranking
  • User Timeline: User's own tweets
  • Mentions Timeline: Tweets mentioning the user
  • List Timelines: Curated user lists

Ranking Signals:

  • Tweet recency and engagement
  • User relationship strength
  • Content type preferences
  • Spam and quality scores

3. Real-Time Search Architecture

Search Components:

  • EarlyBird: Custom real-time search engine
  • Inverted Index: Tweet content, hashtags, mentions
  • Time-based Partitioning: Recent tweets prioritized
  • Distributed Query Execution: Parallel search across partitions

Search Features:

  • Real-time indexing (<10 seconds)
  • Full-text search with operators
  • Trending topics detection
  • Spam and quality filtering

4. User Service

Key Features:

  • OAuth 2.0 authentication
  • Social graph storage (FlockDB)
  • Follow/unfollow operations
  • User verification system
  • Privacy and security settings

5. Direct Message Service

DM Features:

  • End-to-end encryption option
  • Real-time message delivery
  • Group conversations
  • Media sharing (images, videos, GIFs)
  • Read receipts and typing indicators

6. Notification Service

Notification Types:

  • Engagement notifications (likes, retweets, replies)
  • Social notifications (new followers, mentions)
  • Direct message notifications
  • Trending topic alerts
  • Personalized recommendations

Data Storage Architecture

1. Manhattan (Distributed Key-Value Store)

Manhattan Use Cases:

  • Tweet storage (tweet ID → tweet data)
  • Direct messages
  • User timelines
  • Low-latency key-value operations

Features:

  • Geo-replicated across data centers
  • Strong consistency within datacenter
  • Eventual consistency across regions
  • Billions of operations per second

2. MySQL Clusters

MySQL Usage:

  • User account data
  • Tweet metadata
  • Relationships and social graph
  • Application configuration

Sharding Strategy:

  • User ID-based sharding
  • Horizontal scaling to 1000+ shards
  • Read replicas for query distribution

3. Redis Cache Architecture

Redis Use Cases:

  • Timeline caching (home, user, mentions)
  • Session storage
  • Real-time counters (likes, retweets)
  • Rate limiting counters
  • Recent notification cache

4. Cassandra

Cassandra Use Cases:

  • Analytics and metrics
  • Application logs
  • Historical tweet archives
  • Time-series data

5. Hadoop HDFS

HDFS Use Cases:

  • Data warehousing
  • Batch analytics processing
  • Machine learning training data
  • Long-term data archival

Scalability & Performance

1. Tweet Write Path

Write Optimization:

  • Asynchronous fanout processing
  • Batch timeline updates
  • Parallel writes to multiple storage systems
  • Write-through cache strategy

2. Timeline Read Path

Read Optimization:

  • Multi-level caching strategy
  • Prefetching popular content
  • Partial timeline rendering
  • Lazy loading of media

3. Horizontal Scaling

Scaling Strategies:

  • Geo-distributed data centers
  • Auto-scaling based on traffic patterns
  • Service mesh for inter-service communication
  • Database sharding by user ID

4. Caching Strategy

Cache Hierarchy:

  • L1: CDN (static assets, profile images)
  • L2: Edge cache (API responses)
  • L3: Redis (timelines, sessions)
  • L4: Database query cache

Real-Time Features

1. Live Streaming Architecture

Live Features:

  • Periscope integration
  • Real-time chat
  • Live reactions and engagement
  • Low-latency streaming (~3-5 seconds)

Trending Algorithm:

  • Real-time tweet velocity tracking
  • Engagement-based scoring
  • Spam and abuse filtering
  • Geographic personalization
  • Recency weighting

3. Real-Time Recommendations

Recommendation Types:

  • Who to follow suggestions
  • Tweet recommendations
  • Topic suggestions
  • Trending content

Machine Learning Infrastructure

1. ML Pipeline

ML Use Cases:

  • Timeline ranking
  • Content recommendations
  • Spam detection
  • Image/video classification
  • Trend prediction
  • Ad targeting

2. Content Safety & Moderation

Safety Features:

  • Automated spam detection
  • Abusive content filtering
  • Sensitive media detection
  • Misinformation labeling
  • Human-in-the-loop review

Security Architecture

Security Measures:

  • Authentication: OAuth 2.0, JWT tokens
  • Encryption: TLS 1.3, AES-256 at rest
  • DDoS Protection: Multi-layered defense
  • API Security: Rate limiting, key rotation
  • Account Security: MFA, login verification

Monitoring & Observability

Monitoring Metrics:

  • System Metrics: CPU, memory, disk, network
  • Application Metrics: Request latency, error rates
  • Business Metrics: Tweet volume, user engagement
  • Custom Metrics: Timeline generation time, fanout latency

Alerting:

  • Critical Alerts: Service outages, data loss
  • Warning Alerts: High latency, resource saturation
  • Anomaly Alerts: Unusual traffic patterns
  • SLA Monitoring: 99.9% uptime target

Infrastructure & DevOps

1. Multi-Cloud Architecture

Infrastructure Strategy:

  • Multi-cloud approach (AWS, GCP)
  • Hybrid cloud with on-premise data centers
  • Global CDN presence
  • Cost optimization across providers

2. Deployment Pipeline

Deployment Strategy:

  • Continuous Integration/Continuous Deployment (CI/CD)
  • Canary deployments for risk mitigation
  • Blue-green deployments for zero downtime
  • Automated rollback on failures
  • Feature flags for controlled rollouts

3. Infrastructure as Code

IaC Components:

  • Terraform for cloud resource provisioning
  • Ansible for server configuration
  • Kubernetes for container orchestration
  • GitOps workflow for changes

4. Disaster Recovery

DR Metrics:

  • RTO (Recovery Time Objective): < 1 hour
  • RPO (Recovery Point Objective): < 5 minutes
  • Data Backup: Multiple geographic locations
  • Automated Failover: Cross-region redundancy

Performance Optimization

1. Timeline Generation Performance

Performance Techniques:

  • Predictive prefetching based on user behavior
  • Parallel data fetching from multiple sources
  • Edge caching for frequently accessed timelines
  • Progressive rendering for faster perceived load time

2. Media Optimization

Media Features:

  • Automatic image compression (up to 85% size reduction)
  • Multiple format support (WebP, AVIF, JPEG)
  • Responsive images based on device
  • Video transcoding for multiple bitrates
  • Lazy loading for off-screen media

3. Database Query Optimization

Optimization Strategies:

  • Strategic indexing on high-traffic queries
  • Denormalization for read-heavy operations
  • Connection pooling to reduce overhead
  • Query result caching with TTL
  • Database sharding for horizontal scaling

Content Delivery

1. CDN Architecture

CDN Features:

  • Global edge network (100+ locations)
  • Smart routing based on geography
  • Cache hit ratio > 95%
  • Image optimization and transformation
  • Video streaming with adaptive bitrate

2. Asset Pipeline

Asset Optimization:

  • JavaScript/CSS minification
  • Module bundling and code splitting
  • Brotli compression for text assets
  • Image sprites for icons
  • Content hashing for cache busting

Analytics & Business Intelligence

1. Analytics Pipeline

Analytics Use Cases:

  • User engagement metrics
  • Tweet performance analytics
  • Revenue and business metrics
  • A/B testing analysis
  • Fraud detection

2. Key Performance Indicators

Target Metrics:

  • DAU: 250+ million daily active users
  • Tweets/Day: 500+ million tweets
  • API Latency: P95 < 200ms
  • Availability: 99.9% uptime SLA

Mobile Architecture

1. Mobile App Architecture

Mobile Features:

  • Offline timeline caching
  • Background tweet synchronization
  • Image/video compression before upload
  • Progressive image loading
  • Battery and data optimization

2. Push Notification System

Notification Strategy:

  • Intelligent notification batching
  • User preference-based filtering
  • Quiet hours and do-not-disturb
  • Rich notifications with media
  • Deep linking to relevant content

Cost Optimization

Optimization Strategies

Cost Reduction Tactics:

  • Reserved instances for stable workloads (30-50% savings)
  • Spot instances for batch processing (up to 90% savings)
  • Aggressive caching to reduce compute load
  • Data compression and deduplication
  • Multi-cloud strategy for competitive pricing

API Architecture

1. REST API

API Features:

  • RESTful design principles
  • OAuth 2.0 authentication
  • Rate limiting (per endpoint, per user)
  • Webhook support for real-time updates
  • Comprehensive error handling

2. GraphQL API

GraphQL Benefits:

  • Flexible data fetching (request only needed fields)
  • Single request for multiple resources
  • Strong typing and schema validation
  • Efficient for mobile clients (reduced bandwidth)

Ads Platform Architecture

Ad Features:

  • Promoted tweets
  • Promoted accounts
  • Promoted trends
  • Real-time bidding (RTB)
  • Sophisticated targeting (demographics, interests, behaviors)
  • Performance analytics and reporting

Future Architecture Evolution

Emerging Technologies

Scalability Roadmap

  • User Growth: Support 1 billion+ users
  • Real-time Processing: Sub-second global propagation
  • AI Integration: Smarter recommendations and moderation
  • New Content Types: Audio, long-form, video
  • Global Expansion: Low-latency access worldwide

Conclusion

Twitter/X's architecture demonstrates expertise in building ultra-scalable, real-time social networking platforms. The combination of intelligent caching, efficient fanout mechanisms, robust data storage, and advanced machine learning enables Twitter to handle billions of interactions daily while maintaining sub-second response times.

Key architectural principles:

  • Real-time First: Optimized for immediate content distribution
  • Horizontal Scalability: Services scale independently
  • Data Locality: Cache and store data near users
  • Fault Tolerance: Graceful degradation and quick recovery
  • Continuous Evolution: Adapting to new technologies and user needs

The platform continues to evolve, incorporating new features and optimizations to meet growing demands while maintaining the speed and reliability users expect from a real-time social network.

This architecture represents Twitter/X's known systems and best practices. Actual implementation details may vary as the platform continues to evolve.