Stay humble. Stay hungry. Stay foolish.

Step 1 – Understand the problem and establish design scope

  • Functionality. Volume. Length. Characters. URL be deleted or updated.

Back of the envelop estimation

  • Write operation: 100 million URLs are generated per day. Per second: 100 million / 24 / 3600 = 1160.
  • Read operation: Assuming read to write ratio is 10:1. Per second: 1160 * 10 – 11,600
  • Assuming URL shortener service will run for 10 years, it means support 100 million * 365 * 10 = 365 billion records, 365TB storage assuming average URL length is 100.

Step 2 – Propose high-level design and get buy-in

API Endpoints

RESTful API

  • URL shortening. Sends a POST request.
POST api/v1/data/shorten
- request parameter: {longUrl: longURLString}
- return shortURL
  • URL redirecting. Sends a GET request.
GET api/v1/ShortURL
- return longURL for HTTP redirction.

URL redirecting

Server receives a tinyURL request, it changes the short URL to the long URL with 301/302 redirect. Use hash tables to implement storage.

  • 301 redirect: Permanent and cached in browser.
    • Use to reduce the server load.
  • 302 redirect: Temporary and always sent to server first.
    • Use to get analytics.

URL shortening

Find a function that maps a long URL to the hash value. Hash function must (1-1 mapping):

  • Each longURL must be hashed to one hashValue.
  • Each hashValue can be mapped back to the longURL.

Step 3 – Design deep dive

Data model

Store short2long mapping in a relational database with in memory cache.

Hash function

Hash value length.

If use alphanumeric chars, then have 10 + 26 + 26 = 62 chars to use. Need to figure out 62^n >= 365 billion, which indicates n = 7 (3.5 trillion).

Hash + collusion resolution

Use well known hash functions (CRC32/MD5/SHA-1) to hash as 7 char strings. However the hash value is too long.

  • Collect first 7 characters of a hash value. Recursively append a new predefined string and rehash until no more collision is discovered.
  • Problem: Expensive to query database to detect collision.
  • Solution: Can use bloom filters to improve performance. In memory testing before hitting disk based hash table.

Base 62 conversion

Converts unique ID into a URL using base 62.

Comparison

Hash + collision resolution Base 62 conversion
Fixed short URL length. Unfixed short URL length. Goes up with ID.
No need a unique ID generator. Need a unique ID generator.
Collision is possible and need to resolve. Collusion is impossible.
Possible to figure out next available URL. Easy to figure out next available URL. Security risk.

URL shortening deep dive

  • Input longURL.
  • Check longURL in database?
    • Yes. Return shortURL.
    • No.
      • Generate a new ID.
      • Convert ID to shortURL.
      • Save ID, shortURL, longURL in database.

URL redirecting deep dive

  • User requests shortURL.
  • Load balancer forwards to web servers.
  • If shortURL in cache, then return longURL.
  • If not in cache, fetch the longYRL from the database, cache, return.

Step 4 – Wrap Up

  • Rate Limiter. Filter based on IP address to avoid malicious users.
  • Web server scaling. Stateless. Easy to scale.
  • Database scaling. Replica and sharding.
  • Analytics: How many users clicking? When do they click? etc.
  • Availability, consistency and reliability.

Leave a comment