Consistent Hashing (Sharding)
- Basic Sharding: Integer Modulo X % N.
Inconsistent. The reminder changes sharply when N changes. Lots of data migration pressure. - Consistent Hashing
- Basic Solution:
- Divide the circle into 2^64 points.
- Evenly distribute the machine on the circle.
- Calculate the corresponding data point on the circle based on the data-id (fixed between searches).
- Data request finds the next available machine starting from the along the circle (clockwise).
- Improved Solution (Micro shards / Virtual Nodes)
- Map N machines with 1000N virtual node.
- Introduce 1000N virtual nodes and randomly / evenly distribute the virtual nodes on the circle.
- Data request finds the next available virtual nodes and finds the corresponding machine. (Binary search)
- Implementation:
- It doesn’t need to store the data point but need to store the machine point.
- Use treemap (red-black-tree). Find the next available machine point with time complexity of $\latex O(logN)$.
- Mantannice:
- Add a new machine.
- Randomly add 1000 virtual nodes. Ask for all the next available virtual nodes to migrate data.
- Delete an old machine.
- Migrate the data to the next available virtual nodes.
- Add a new machine.
- Basic Solution:
Replica
- Backup vs Replica
- Backup: offline / periodical / doesn’t read (Not busy time, Cheap)
- Replica: online / same-time / read (Runtime, Expensive)
- MySQL Replica: Master-slave
- Master write. Slave read. Slave sync from the master.
- (Comes with the feature) Write ahead log
- SQL logs all operations.
- When a slave is active, it informs the master.
- The master notifies the slave to read the log. (Sync log not value) (The user only waits for the write to master, not the write to slave).
- The slaves always have a delay.
- If the master dies, the slave will be promoted to master. Slight data loss and inconsistency.
- NoSQL: Find three virtual nodes in the consistent hashing ring and write three times.
- Difference:
- SQL:
- Auto Replica – Master-Slave
- Manual Replica – Consistent Hashing triple stores
- NoSQL:
- Auto Replica – Consistent Hashing triple stores
- Manual Replica – Not necessary
- SQL:
Design Tiny URL
- Workflow
- Request to tiny URL service
- Redirect to target website
- Go to target website
- Scenario
- 100 DAU
- Publish QPS = 100M * 0.1 / 86,400 = 100; Peak = 200
- Read QPS = 100M * 1 / 86,400 = 1000; Peak = 2k
- Storeage: 100M * 0.1 * 100bytes = 1G per day
- Serive
- UrlService.encode -> GET
- UrlService.decode -> POST
- Storage:
- No Transaction
- No SQL Features
- No Complicated
- No Sequential ID (MySQL provides incremental ID. NoSQL doesn’t provide)
- Algorithm:
- Hash function.
- Pros: Fast
- Cons: Conflict
- Random generation. Remove duplicates.
- Pros: Easy implementation
- Cons: Performance gets slower after more URLs.
- Implementation: ShortKey, LongUrl. Double Index. (NoSQL, row_key + column_key)
- Transfer to Base62
- Base 62 (0-9, a-z, A-Z)
- Pros: Efficient
- Cons: Relay on global incremental ID. (SQL)
- Implementation: SQL
- Hash function.
- Scale:
- Performance:
- Cache
- Cache aside.
- Look shortUrl in the cache. If it doesn’t find it, read the data in the database.
- Location-Based
- Server:
- DNS matches users with servers at different locations
- Database:
- Centralized Database but distributed Memcached.
- Webserver access database server is faster than the user access database server.
- Server:
- Multiple Database:
- Purpose: Solve QPS problem. Not storage problem.
- Horizontal Sharding.
- Problem:
Sharding based short. How to determine where is long.
Sharding based long. How to determine where is short. - Solution:
- Duplicate
- S -> L. S as sharding key.
- L -> S. L as sharding key.
- One long URL maps to multiple short URLs
- Extend the short URL.
- Add the long URL sharding key as one digit.
- Use 62 virtual nodes for sharding.
- Duplicate
- Problem:
- Multi-Region
- Sharing based on location of the service.
- Store hot short URLs in the nearby database server. – Training
- Cache
- Timeout
- Better not.
- If need to timeout, use timeout service.
- Custom URL
- Create a custom URL table. Mix with a random short URL.
- Performance:
Leave a comment