Introduction to System Design
- Design Questions
- System Design: Work Solution / Special Case / Analysis / Tradeoff / Knowledge Basse
- OOD Design
- API Design
- System Design
- Scenario: Design Features
Ask Features / QPS(Queries Per Second / DAU(Daily Active Users) / Interfaces - Service: Split a big system into small services
Split / Application / Module - Storage: Data storage and access
Schema / Data / SQL vs NoSQL vs File System - Scale: Large-scale system
Sharing / Optimize / Special Case
- Scenario: Design Features
- Scenario – ASK
-
- MAU / DAU: DAU = MAU x 0.5
- Concurrent users: DAU x Daily Query
- QPS (Queries Per Second):
- Average: Concurrent Users / Second Per Day (86,400)
- Peak: Average x 3
- Fast Growing: Peak x 2
- Read QPS / Write QPS
- Features:
- Enumerate all the features.
- Sort all the features. Pick the top features.
- Capacity
- Server
- QPS = 100: Single laptop
- QPS = 1k: Single web server. Single point failure problem.
- QPS = 1m: 1k web servers. Maintainance.
- Database
- SQL Database 1k QPS
- NoSQL Database (Cassandra) 10k QPS
- NoSQL Database (Memcached / In-memory Cache) 100k-1M QPS
- Server
- Service
- Replay (go through features) / Merge (merge features)
- The request got to Router
Example (Twitter): User Service / Tweet Service / Media Service / Friendship Service
- Storage
- Choices
- In-Memory (NoSQL): Non-persistent – Memcached / Redis
Example: Cache - SQL Database: Structural information – MySQL / PostgreSQL
Example: User Table - NoSQL: Litte / No structural information – MongoDB / Cassandra
Example: Tweets, Social Graph - File System: Non-structural information – AWS S3
Example: Picture, Video, Media Files
- In-Memory (NoSQL): Non-persistent – Memcached / Redis
- Steps (System Design = Algo + Storage)
- Select
Example (Twitter)- User Service: SQL. MySQL.
- Tweet Service: NoSQL: MongoDB.
- Media Service: File System: AWS S3.
- FriendShip Service: SQL / NoSQL
- Schema
Example (Twitter)- User Table: id, username, email, password.
- Friendship Table: from_user_id, to_user_id
- Tweet Table: id, user_id, content, created_at
- Media Service (No schema)
- Select
- Choices
Design News Feed
- Timeline vs News Feed
Publisher: Timeline. Observer: News Feed. - Push vs Pull
- Pull Model
- Method: (Tweet Table + Friendship Table)
- Request to the webserver.
- Fetch the friendship table.
- Fetch the top 100 tweets from all friends.
- Merge them into the top 100 news feeds.
- Complexity: N following.
- Get News Feed (N following)
- Read the database N + 1 times.
- Merge all tweets
- Post a tweet
- Write the database 1 time.
- Get News Feed (N following)
- Problem:
- Access database is 1k slower than memory
- Method: (Tweet Table + Friendship Table)
- Push Model
- Overall:
- Post tweet to all the news feed list of followers. Fanout.
- Users get 100 most recent tweets from his news feed list directly.
- Method: (Tweet Table + Friendship Table + News Feed Table)
- Post a tweet to the webserver.
- Web server inserts tweet to DB (tweet table).
- Web server sends the tweet to async tasks server.
- Async task server gets followers from DB (friendship table).
- Async task server fanouts (inserts new tweet to followers news feed). (Newsfeed table) (Message queue. Such as Kafka).
- Complexity:
- Get News Feed:
- Read the database 1 time.
- Post a tweet (N followers)
- Write the database N times.
- Pros: However, it can run in the background async.
- Cons: DB writing is slower than reading. The number of followers can be massive.
- Get News Feed:
- Problem:
- Delay (fanout is slow)
- Inactive users
- Duplicate data (not important)
- In practice
- Example:
- Facebook – Pull
- Instagram – Push + Pull
- Twitter – Pull
- Most use pulls:
- The user experience of message delay.
- Inactive users.
- Example:
- Scale
- Optimize
- Pull Model
- Cache the most recent user’s tweets (timeline) in memory.
- Access Memcached (100k QPS) is 100 faster than MySQL (10 QPS).
- Cache the most recent friend’s tweets (news feed) in memory.
- Get the new tweets between to queries.
- Cache the most recent user’s tweets (timeline) in memory.
- Push Model
- Waste disk: not important
- Inactive users: rank followers by weight
- Massive followers:
- Common users: Push
- Celebrity users (don’t change frequently): Pull
- Merge the two sources of tweets together. But do not store celebrity tweets into own user news feed list.
- (Don’t switch between pull/push)
- In practice:
- Start with push (small-scale) and ends with pull (large-scale).
- Push:
- Fewer resources. (No cache)
- Easy implementation. (No need to merge)
- No realtime requirements. (Async push)
- Fewer posts. (Can duplicate storage)
- Bidirectional followings. (No massive followers)
- Pull:
- Enough resources. (Cache)
- Realtime requirements.
- More posts. (Cannot duplicate storage)
- Single directional following. (Massive followers)
- Additional Features:
- How to stores likes:
- Add attributes of like_sums. comment_sums, retweet_sums.
- Local aggregation.
- Celebrity Post:
- Thunder herd problem.
- How to stores likes:
- Pull Model
- Optimize
- Overall:
- Pull Model
Leave a comment