Design User System

Overview
1. Memcached
2. SQL vs NoSQL
3. Authentication
4. Friendship
User System
1. Scenario
  1. Features: Register, Login, Query (Most heavily used), User Information Modification,
  2. Assume 100M DAU
    1. Register / Login / Profile: Peak QPS = (100M * 0.1 / 86400) * 3 = 300
    2. Query: Peak QPS = (100M * 100 / 86400) * 3 = 300k
2. Service
  1. AuthService
  2. UserService (User information)
  3. FriendshipService
3. Storage
  1. Choices
    SQL / In-Memory NoSQL / In-Disk NoSQL / File System (Refer to System Design Overview & Design News Feed / Timeline)
  2. Character:
    1. Read more and write less -> Must use cache (In-memory NoSQL)
    2. User facing: Read more and write less
    3. Machine facing: Write more and read less
  3. Cache: (Key-Value Storage)
    1. Framework:
      1. Memcached (No persistence)
      2. Redis (Support persistence)
    2. High-Level idea:
      Register -> CPU Cache -> Memory -> File System -> Network
    3. Use case:
      Cache in Frontend -> Client / Browser.
      Cache in Backend -> Server.
    4. Memcached features:
      1. Time out
      2. Eviction: LRU cache / LFU cache
    5. How to optimize database query
      1. Read:
        
        try to get from the cache.
        
        If it does not exist, get from the database.
      2. Write: (Correct in most cases)
        
        Delete from the cache.
        
        Set from the database.
      3. Write Principle:
        
        The cache and the database should be consistent.
        
        Even if the current threads know the write failed. But other threads might read the dirty data before retry.
        
        Always set cache with a timeout to avoid inconsistency.
Authentication Service
1. Use the Session Table
  1. Schema: session_key; user_id; expire_at
  2. Process: (after login)
    1. Create a session. (hash value, unique. for example, uuid)
    2. Store the session key as a cookie and returns it to the browser.
    3. The user sends all the queries to the webserver with the cookie.
    4. The web server detects the session_key within the cookie. Mark login if the session_key is not expired.
    5. After the user logout, delete the cookie.
  3. Storage:
    1. Small website: all stored in the cache.
    2. Large website: cannot all stored in the cache. Store both in the database and the cache.
      1. Problem: if the cache server crashed, all users will need to login again. (session_keys are lost)
2. Summary:
  1. Read > Write: Memcached
  2. Write > Read: MySQL
  3. Both:
    1. More database servers
    2. Redis like cache-through database
      1. Redis: Cache-through. Store both in memory and database.
      2. Memcached: Cache-aside. Store only in memory. Users need to take care of the database. (Memcached + MySQL)
Friendship Service
1. Storage
  1. Unidirectional Friendship (Twitter)
    1. Schema: from_user_id, to_user_id
  2. Bidirectional Friendship (Facebook)
    1. Schema:
      1. Single: smaller_user_id, larger_user_id.
        
        Query all friend: twice (multi-index)
      2. Double: from_user_id, to_user_id.
        
        Query all friends: once (single-index, faster)
SQL vs NoSQL
1. Choice
  1. General cases: Both OK.
  2. Prefer SQL:
    1. Need to support transaction
    2. Need to use SQL features (Serialization, Secondary Index)
      1. SQL support numeric, datatime, char string, unicode string, binary, miscellaneous (XML/JSON/…)
  3. Prefer NoSQL
    1. NoSQL has better performance. 10x faster than SQL.
      1. Example: Friendship table.
2. Structure
  1. SQL
    1. Row-based storage. The columns are fixed in the schema.
  2. NoSQL
    1. Column-based storage. Actually, it can be infinitely large.
    2. Data are grid-based. Record = row_key + column_key + value (serialization).
3. Example of NoSQL
  1. Cassandra: Three-level structure
    1. Key (row_key + column_key) – Value storage
    2. row_key: hash_key. Used in sharding.
      Must have upon the query. hash_key. Cannot perform range queries.
    3. column_key: ordered_key.
      Optional upon the query. Can be a combination. Can perform range queries.
    4. value: string
      serialized information
  2. Use Cassandra:
    1. Friendship service:
      1. row_key: one user-id
      2. column_key: another user-id
      3. value: metadata
    2. Newsfeed:
      1. row_key: user-id
      2. column_key: timestamp
      3. value: other data (content…)

How to Scale

Single Point Failure
1. Sharding
  1. Use row keys to determine the location of the data.
    (Sharding Key / Partition Key)
    1. SQL doesn’t come with sharding.
    2. NoSQL (use Cassandra as an example) comes with sharding.
  2. Vertical Sharding: (Split Tables / Split Columns)
    1. Determine the location of the data: Columns.
    2. For example:
      User Table split into User Table (Seldom Change) and User Profile Table (Always Change)
    3. Problem:
      1. The table is too large.
      2. It cannot solve single point failure.
  3. Horizontal Sharding: (Split Rows)
    1. Determine the location of the data: ID mode N
    2. Problem:
      1. Not extendable. Involve massive amounts of data migration when adding/removing servers.
      2. Solution -> Consistent Hashing
2. Consistent Hashing (Optimized Horizontal Sharding)
  1. Select a large number.
  2. Use the remainder of the division to decide the location of the data.
  3. Low overhead migration.
3. Replica (3 times)
  1. Avoid data loss.
  2. Share data read requests.
  3. Store at different locations.

levendlee

Design User System

Design User System

User System

Authentication Service

Friendship Service

SQL vs NoSQL

How to Scale

Leave a comment Cancel reply

Design User System

Design User System

User System

Authentication Service

Friendship Service

SQL vs NoSQL

How to Scale

Share this:

Leave a comment Cancel reply