Introduction to System Design

Design Questions
1. System Design: Work Solution / Special Case / Analysis / Tradeoff / Knowledge Basse
2. OOD Design
3. API Design
System Design
1. Scenario: Design Features
  Ask Features / QPS(Queries Per Second / DAU(Daily Active Users) / Interfaces
2. Service: Split a big system into small services
  Split / Application / Module
3. Storage: Data storage and access
  Schema / Data / SQL vs NoSQL vs File System
4. Scale: Large-scale system
  Sharing / Optimize / Special Case
Scenario – ASK

1. MAU / DAU: DAU = MAU x 0.5
2. Concurrent users: DAU x Daily Query
3. QPS (Queries Per Second):
  1. Average: Concurrent Users / Second Per Day (86,400)
  2. Peak: Average x 3
  3. Fast Growing: Peak x 2
  4. Read QPS / Write QPS
4. Features:
  1. Enumerate all the features.
  2. Sort all the features. Pick the top features.
5. Capacity
  1. Server
    1. QPS = 100: Single laptop
    2. QPS = 1k: Single web server. Single point failure problem.
    3. QPS = 1m: 1k web servers. Maintainance.
  2. Database
    1. SQL Database 1k QPS
    2. NoSQL Database (Cassandra) 10k QPS
    3. NoSQL Database (Memcached / In-memory Cache) 100k-1M QPS
Service
1. Replay (go through features) / Merge (merge features)
2. The request got to Router
  Example (Twitter): User Service / Tweet Service / Media Service / Friendship Service
Storage
1. Choices
  1. In-Memory (NoSQL): Non-persistent – Memcached / Redis
    Example: Cache
  2. SQL Database: Structural information – MySQL / PostgreSQL
    Example: User Table
  3. NoSQL: Litte / No structural information – MongoDB / Cassandra
    Example: Tweets, Social Graph
  4. File System: Non-structural information – AWS S3
    Example: Picture, Video, Media Files
2. Steps (System Design = Algo + Storage)
  1. Select
    Example (Twitter)
    1. User Service: SQL. MySQL.
    2. Tweet Service: NoSQL: MongoDB.
    3. Media Service: File System: AWS S3.
    4. FriendShip Service: SQL / NoSQL
  2. Schema
    Example (Twitter)
    1. User Table: id, username, email, password.
    2. Friendship Table: from_user_id, to_user_id
    3. Tweet Table: id, user_id, content, created_at
    4. Media Service (No schema)

Design News Feed

Timeline vs News Feed
Publisher: Timeline. Observer: News Feed.
Push vs Pull
1. Pull Model
  1. Method: (Tweet Table + Friendship Table)
    1. Request to the webserver.
    2. Fetch the friendship table.
    3. Fetch the top 100 tweets from all friends.
    4. Merge them into the top 100 news feeds.
  2. Complexity: N following.
    1. Get News Feed (N following)
      1. Read the database N + 1 times.
      2. Merge all tweets
    2. Post a tweet
      1. Write the database 1 time.
  3. Problem:
    1. Access database is 1k slower than memory
2. Push Model
  1. Overall:
    1. Post tweet to all the news feed list of followers. Fanout.
    2. Users get 100 most recent tweets from his news feed list directly.
  2. Method: (Tweet Table + Friendship Table + News Feed Table)
    1. Post a tweet to the webserver.
    2. Web server inserts tweet to DB (tweet table).
    3. Web server sends the tweet to async tasks server.
    4. Async task server gets followers from DB (friendship table).
    5. Async task server fanouts (inserts new tweet to followers news feed). (Newsfeed table) (Message queue. Such as Kafka).
  3. Complexity:
    1. Get News Feed:
      1. Read the database 1 time.
    2. Post a tweet (N followers)
      1. Write the database N times.
      2. Pros: However, it can run in the background async.
      3. Cons: DB writing is slower than reading. The number of followers can be massive.
  4. Problem:
    1. Delay (fanout is slow)
    2. Inactive users
    3. Duplicate data (not important)
  5. In practice
    1. Example:
      1. Facebook – Pull
      2. Instagram – Push + Pull
      3. Twitter – Pull
    2. Most use pulls:
      1. The user experience of message delay.
      2. Inactive users.
  6. Scale
    1. Optimize
      1. Pull Model
        
        Cache the most recent user’s tweets (timeline) in memory.
        
        Access Memcached (100k QPS) is 100 faster than MySQL (10 QPS).
        
        Cache the most recent friend’s tweets (news feed) in memory.
        
        Get the new tweets between to queries.
      2. Push Model
        
        Waste disk: not important
        
        Inactive users: rank followers by weight
        
        Massive followers:
        
        Common users: Push
        
        Celebrity users (don’t change frequently): Pull
        
        Merge the two sources of tweets together. But do not store celebrity tweets into own user news feed list.
        
        (Don’t switch between pull/push)
      3. In practice:
        
        Start with push (small-scale) and ends with pull (large-scale).
        
        Push:
        
        Fewer resources. (No cache)
        
        Easy implementation. (No need to merge)
        
        No realtime requirements. (Async push)
        
        Fewer posts. (Can duplicate storage)
        
        Bidirectional followings. (No massive followers)
        
        Pull:
        
        Enough resources. (Cache)
        
        Realtime requirements.
        
        More posts. (Cannot duplicate storage)
        
        Single directional following. (Massive followers)
      4. Additional Features:
        
        How to stores likes:
        
        Add attributes of like_sums. comment_sums, retweet_sums.
        
        Local aggregation.
        
        Celebrity Post:
        
        Thunder herd problem.

levendlee

System Design Overview & Design News Feed / Timeline

Introduction to System Design

Design News Feed

Leave a comment Cancel reply

System Design Overview & Design News Feed / Timeline

Introduction to System Design

Design News Feed

Share this:

Leave a comment Cancel reply