Design User System
- Overview
- Memcached
- SQL vs NoSQL
- Authentication
- Friendship
-
User System
- Scenario
- Features: Register, Login, Query (Most heavily used), User Information Modification,
- Assume 100M DAU
- Register / Login / Profile: Peak QPS = (100M * 0.1 / 86400) * 3 = 300
- Query: Peak QPS = (100M * 100 / 86400) * 3 = 300k
- Service
- AuthService
- UserService (User information)
- FriendshipService
- Storage
- Choices
SQL / In-Memory NoSQL / In-Disk NoSQL / File System (Refer to System Design Overview & Design News Feed / Timeline) - Character:
- Read more and write less -> Must use cache (In-memory NoSQL)
- User facing: Read more and write less
- Machine facing: Write more and read less
- Cache: (Key-Value Storage)
- Framework:
- Memcached (No persistence)
- Redis (Support persistence)
- High-Level idea:
Register -> CPU Cache -> Memory -> File System -> Network - Use case:
Cache in Frontend -> Client / Browser.
Cache in Backend -> Server. - Memcached features:
- Time out
- Eviction: LRU cache / LFU cache
- How to optimize database query
- Read:
- try to get from the cache.
- If it does not exist, get from the database.
- Write: (Correct in most cases)
- Delete from the cache.
- Set from the database.
- Write Principle:
- The cache and the database should be consistent.
- Even if the current threads know the write failed. But other threads might read the dirty data before retry.
- Always set cache with a timeout to avoid inconsistency.
- The cache and the database should be consistent.
- Read:
- Framework:
- Choices
- Scenario
-
Authentication Service
- Use the Session Table
- Schema: session_key; user_id; expire_at
- Process: (after login)
- Create a session. (hash value, unique. for example, uuid)
- Store the session key as a cookie and returns it to the browser.
- The user sends all the queries to the webserver with the cookie.
- The web server detects the session_key within the cookie. Mark login if the session_key is not expired.
- After the user logout, delete the cookie.
- Storage:
- Small website: all stored in the cache.
- Large website: cannot all stored in the cache. Store both in the database and the cache.
- Problem: if the cache server crashed, all users will need to login again. (session_keys are lost)
- Summary:
- Read > Write: Memcached
- Write > Read: MySQL
- Both:
- More database servers
- Redis like cache-through database
- Redis: Cache-through. Store both in memory and database.
- Memcached: Cache-aside. Store only in memory. Users need to take care of the database. (Memcached + MySQL)
- Use the Session Table
-
Friendship Service
- Storage
- Unidirectional Friendship (Twitter)
- Schema: from_user_id, to_user_id
- Bidirectional Friendship (Facebook)
- Schema:
- Single: smaller_user_id, larger_user_id.
- Query all friend: twice (multi-index)
- Double: from_user_id, to_user_id.
- Query all friends: once (single-index, faster)
- Single: smaller_user_id, larger_user_id.
- Schema:
- Unidirectional Friendship (Twitter)
- Storage
-
SQL vs NoSQL
- Choice
- General cases: Both OK.
- Prefer SQL:
- Need to support transaction
- Need to use SQL features (Serialization, Secondary Index)
- SQL support numeric, datatime, char string, unicode string, binary, miscellaneous (XML/JSON/…)
- Prefer NoSQL
- NoSQL has better performance. 10x faster than SQL.
- Example: Friendship table.
- NoSQL has better performance. 10x faster than SQL.
- Structure
- SQL
- Row-based storage. The columns are fixed in the schema.
- NoSQL
- Column-based storage. Actually, it can be infinitely large.
- Data are grid-based. Record = row_key + column_key + value (serialization).
- SQL
- Example of NoSQL
- Cassandra: Three-level structure
- Key (row_key + column_key) – Value storage
- row_key: hash_key. Used in sharding.
Must have upon the query. hash_key. Cannot perform range queries. - column_key: ordered_key.
Optional upon the query. Can be a combination. Can perform range queries. - value: string
serialized information
- Use Cassandra:
- Friendship service:
- row_key: one user-id
- column_key: another user-id
- value: metadata
- Newsfeed:
- row_key: user-id
- column_key: timestamp
- value: other data (content…)
- Friendship service:
- Cassandra: Three-level structure
- Choice
How to Scale
- Single Point Failure
- Sharding
- Use row keys to determine the location of the data.
(Sharding Key / Partition Key)- SQL doesn’t come with sharding.
- NoSQL (use Cassandra as an example) comes with sharding.
- Vertical Sharding: (Split Tables / Split Columns)
- Determine the location of the data: Columns.
- For example:
User Table split into User Table (Seldom Change) and User Profile Table (Always Change) - Problem:
- The table is too large.
- It cannot solve single point failure.
- Horizontal Sharding: (Split Rows)
- Determine the location of the data: ID mode N
- Problem:
- Not extendable. Involve massive amounts of data migration when adding/removing servers.
- Solution -> Consistent Hashing
- Use row keys to determine the location of the data.
- Consistent Hashing (Optimized Horizontal Sharding)
- Select a large number.
- Use the remainder of the division to decide the location of the data.
- Low overhead migration.
- Replica (3 times)
- Avoid data loss.
- Share data read requests.
- Store at different locations.
- Sharding
Leave a comment