TikTok is one of the most demanding real-time video platforms with a massive global user base. Designing a scaled-down version of TikTok is a common system design interview challenge at companies like ByteDance, Meta, Google, and Amazon. In this post, weβll walk through how to approach a TikTok system design interview, focusing on functional, non-functional requirements, scaling strategies, and metric-driven architecture.
Table of Contents
Tiktok System Design interview
Functional Requirements
Core User Interactions
A simplified TikTok-like platform should allow users to:
- β Upload short-form videos
- β Watch a stream of short videos
- β Like videos
- β Share videos
These features should provide a seamless, real-time experience to users globally.
Reduced Scope for Interview
To manage time and complexity during a 45β60 minute system design interview, itβs reasonable to defer or skip certain functionalities.
Features to Skip (with interviewer’s consent):
- β Sign-Up / Login / Auth (assume it’s done)
- β Full-fledged UI/Frontend design
- β AI-powered feed ranking
- β Chat and comments system
This keeps the scope focused and allows you to go deeper into scalable backend services.
Non-Functional Requirements
Beyond user-facing functionality, TikTok must meet rigorous performance and reliability standards.
π High Availability
The system should handle failures gracefully and ensure uptime across regions.
βοΈ Scalability
With 1B+ global users, the system should scale horizontally to manage massive read/write traffic.
β‘ Low Latency
- Video load times should be near-instant (under 200ms).
- Video upload should be fast and reliable, even on mobile networks.
π± Multi-device Support
The system must support access from:
- Smartphones (iOS, Android)
- Tablets
- Desktops
- TVs (via casting)
System Metrics
These are essential for understanding system design scale:
Metric | Value |
---|---|
Total users | 1 billion+ |
Countries served | 150+ |
Videos viewed/day | 1 billion |
Videos uploaded/day | 10 billion |
Success metric | Time spent per user (~1 hour/day) |
Detailed Assumptions
To ground our system design in real-world constraints, letβs make some storage and size assumptions.
πΉ Video Details
- Resolution: 1080 Γ 1920 px
- Avg Duration: 10 seconds
- Avg Size: 1 MB
π¦ Annual Storage Estimate
Letβs calculate the total video storage required per year:
Video Size = 1MB
Videos/Day = 10B
Videos/Year = 10B Γ 365 = 3.65 Trillion
Storage Required = 1MB Γ 3.65T
= 3.65 Γ 10^15 bytes
= 3.65 PB/year
Even with optimizations, we estimate ~10 PB/year considering overhead, backup, and replication.
System Architecture
High-Level Overview
[Mobile/Web Clients]
|
[API Gateway]
|
ββββββββββββββ¬βββββββββββββ
| | |
Upload Svc Feed Svc Video Svc
| | |
Object Storage CDN Metadata DB
Each core module communicates via REST/gRPC, and services are containerized using Kubernetes.
Video Upload Pipeline
Steps:
- Client Uploads Video
Multipart form or resumable chunked upload - Upload Service Receives
Stores in an Object Store (S3/GCS)
Generates a video ID and thumbnail - Metadata Service Writes to DB
Stores metadata like uploader ID, video length, tags, timestamp - Transcoding Pipeline
Async job using services like FFmpeg to generate multiple resolutions - Caching Popular Content
Redis/Memcached layer for hot content
Video Playback & CDN
- Client requests video by ID
- Edge CDN (Cloudflare, Akamai) serves cached video
- Fallback to Origin Store for cold content
- Pre-fetching used to preload next videos
CDN Performance Goal:
β± <100ms video load from any region
Like, Share, and Engagement Services
Likes
- Handled via a dedicated Like Service
- Writes to NoSQL (Cassandra / DynamoDB) for high write throughput
- Updates Redis Cache for fast like count fetch
Shares
- Triggers event to Analytics Service
- May include social sharing integration (optional scope)
Scaling Strategies
Horizontal Scaling
- Upload, feed, metadata, and like services scale independently
- Kubernetes + Auto Scaling Groups
Partitioning Strategy
- Partition video metadata by userID / region
- Shard video object storage by videoID prefix
Async Processing
- Use message queues (Kafka/PubSub) for:
- Transcoding
- Analytics
- Notifications
Storage Design
Component | Storage Type | Technology |
---|---|---|
Video Files | Object Storage | Amazon S3, GCS, MinIO |
Metadata | Relational DB | PostgreSQL, MySQL |
Likes / Shares | NoSQL Key-Value | Redis, Cassandra |
Feed Generation | Graph/Time-series | Neo4j, TimescaleDB |
Caching Layer | In-Memory Cache | Redis, Memcached |
Caching & Optimization
Redis Use Cases:
- Caching feed results
- Like counts
- Hot video metadata
Content Delivery:
- Use CloudFront/Cloudflare for geographic edge delivery
- Enable pre-warming of cache during upload/transcode
Compression:
- Store videos using H.264 or AV1 codecs
- Apply adaptive bitrate streaming for slow networks
Monitoring and Metrics
Key Metrics to Track
Metric | Description |
---|---|
Upload Success Rate | % of successfully uploaded videos |
Video Play Latency | Time to first frame |
Time Spent per User | Engagement metric |
CDN Cache Hit Ratio | Edge performance |
Transcoding Queue Lag | Bottleneck indicator |
Feed Query Time | Backend performance |
Tools
- Prometheus + Grafana for backend metrics
- ELK stack / Datadog for logs
- Jaeger for distributed tracing
Conclusion
TikTok System Design Interview
Designing TikTok is a complex but exciting system design challenge. In interviews, clarity and prioritization are key. Start by defining a reduced but impactful feature set, then focus on high-level architecture, data flows, and scalability strategies.
Remember to always validate your assumptions and include metrics-based thinking in your responses.
Learn More
π For more such in-depth posts, system design breakdowns, and code samples, visit https://codeandalgo.com