Stay humble. Stay hungry. Stay foolish.

Introduction to System Design

  1. Design Questions
    1. System Design: Work Solution / Special Case / Analysis / Tradeoff / Knowledge Basse
    2. OOD Design
    3. API Design
  2. System Design
    1. Scenario: Design Features
      Ask Features / QPS(Queries Per Second / DAU(Daily Active Users) / Interfaces
    2. Service:  Split a big system into small services
      Split / Application / Module
    3. Storage: Data storage and access
      Schema / Data / SQL vs NoSQL vs File System
    4. Scale: Large-scale system
      Sharing / Optimize / Special Case
  3. Scenario – ASK
    1. MAU / DAU: DAU = MAU x 0.5
    2. Concurrent users: DAU x Daily Query
    3. QPS (Queries Per Second):
      1. Average: Concurrent Users / Second Per Day (86,400)
      2. Peak: Average x 3
      3. Fast Growing: Peak x 2
      4. Read QPS / Write QPS
    4. Features:
      1. Enumerate all the features.
      2. Sort all the features. Pick the top features.
    5. Capacity
      1. Server
        1. QPS = 100: Single laptop
        2. QPS = 1k: Single web server. Single point failure problem.
        3. QPS = 1m: 1k web servers. Maintainance.
      2. Database
        1. SQL Database 1k QPS
        2. NoSQL Database (Cassandra) 10k QPS
        3. NoSQL Database (Memcached / In-memory Cache) 100k-1M QPS
  1. Service
    1. Replay (go through features) / Merge (merge features)
    2. The request got to Router
      Example (Twitter): User Service / Tweet Service / Media Service / Friendship Service
  2. Storage
    1. Choices
      1. In-Memory (NoSQL): Non-persistent – Memcached / Redis
        Example: Cache
      2. SQL Database: Structural information – MySQL / PostgreSQL
        Example:  User Table
      3. NoSQL: Litte / No structural information – MongoDB / Cassandra
        Example: Tweets, Social Graph
      4. File System: Non-structural information – AWS S3
        Example: Picture, Video, Media Files
    2. Steps (System Design = Algo + Storage)
      1. Select
        Example (Twitter)

        1. User Service: SQL. MySQL.
        2. Tweet Service: NoSQL: MongoDB.
        3. Media Service: File System: AWS S3.
        4. FriendShip Service: SQL / NoSQL
      2. Schema
        Example (Twitter)

        1. User Table: id, username, email, password.
        2. Friendship Table: from_user_id, to_user_id
        3. Tweet Table: id, user_id, content, created_at
        4. Media Service (No schema)

Design News Feed

  1. Timeline vs News Feed
    Publisher: Timeline. Observer: News Feed.
  2. Push vs Pull
    1. Pull Model
      1. Method: (Tweet Table + Friendship Table)
        1. Request to the webserver.
        2. Fetch the friendship table.
        3. Fetch the top 100 tweets from all friends.
        4. Merge them into the top 100 news feeds.
      2. Complexity: N following.
        1. Get News Feed (N following)
          1. Read the database N + 1 times.
          2. Merge all tweets
        2. Post a tweet
          1. Write the database 1 time.
      3. Problem:
        1. Access database is 1k slower than memory
    2. Push Model
      1. Overall:
        1. Post tweet to all the news feed list of followers. Fanout.
        2. Users get 100 most recent tweets from his news feed list directly.
      2. Method: (Tweet Table + Friendship Table + News Feed Table)
        1. Post a tweet to the webserver.
        2. Web server inserts tweet to DB (tweet table).
        3. Web server sends the tweet to async tasks server.
        4. Async task server gets followers from DB (friendship table).
        5. Async task server fanouts (inserts new tweet to followers news feed). (Newsfeed table) (Message queue. Such as Kafka).
      3. Complexity:
        1. Get News Feed:
          1. Read the database 1 time.
        2. Post a tweet (N followers)
          1. Write the database N times.
          2. Pros: However, it can run in the background async.
          3. Cons: DB writing is slower than reading. The number of followers can be massive.
      4. Problem:
        1. Delay (fanout is slow)
        2. Inactive users
        3. Duplicate data (not important)
      5. In practice
        1. Example:
          1. Facebook – Pull
          2. Instagram – Push + Pull
          3. Twitter – Pull
        2. Most use pulls:
          1. The user experience of message delay.
          2. Inactive users.
      6. Scale
        1. Optimize
          1. Pull Model
            1. Cache the most recent user’s tweets (timeline) in memory.
              1. Access Memcached (100k QPS) is 100 faster than MySQL (10 QPS).
            2. Cache the most recent friend’s tweets (news feed) in memory.
              1. Get the new tweets between to queries.
          2. Push Model
            1. Waste disk: not important
            2. Inactive users: rank followers by weight
            3. Massive followers:
              1. Common users: Push
              2. Celebrity users (don’t change frequently): Pull
              3. Merge the two sources of tweets together. But do not store celebrity tweets into own user news feed list.
            4. (Don’t switch between pull/push)
          3. In practice:
            1. Start with push (small-scale) and ends with pull (large-scale).
            2. Push:
              1. Fewer resources. (No cache)
              2. Easy implementation. (No need to merge)
              3. No realtime requirements. (Async push)
              4. Fewer posts. (Can duplicate storage)
              5. Bidirectional followings. (No massive followers)
            3. Pull:
              1. Enough resources. (Cache)
              2. Realtime requirements.
              3. More posts. (Cannot duplicate storage)
              4. Single directional following. (Massive followers)
          4. Additional Features:
            1. How to stores likes:
              1. Add attributes of like_sums. comment_sums, retweet_sums.
              2. Local aggregation.
            2. Celebrity Post:
              1. Thunder herd problem.

Tags

Leave a comment