How to design Uber
- Uber Technique
- RingPop: Web server and database at the same location.
- TChannel: RPC protocol
- Google S2: Location-based storage and query
- Riak: Open source dynamo DB
- Load Balance (webserver)
- No need to use consistent hashing.
- Random.
- Based on the webserver’s workload.
- Implementation
- Hardware
- Software Nginx
- No need to use consistent hashing.
- Scenario
- Features
- Basic
- Driver report locations: Heat beat. Report per 4 seconds.
- Rider request Uber, match a driver with the rider
- Improve:
- Denny / Accept
- Cancel
- Pick up / Drop off
- Add-On
- Pool / Eats
- Basic
- Requirement
- Ask Drivers / Riders / Area
- Driver: Assume 200K. Avg QPS 200K / 4 = 50K. Peak QPS 50K * 3 = 150K. (DB Write)
- Rider: Ignorable.
- Features
- Service
- Basic Structure: Both services talk to users
- Driver <-> Geo Service
| - Rider <-> Dispatch Service
- Driver <-> Geo Service
- Improved Structure: Only dispatch service talks to users
- Driver <-> Driver save location
Dispatch Service <-> Geo Service - Rider <-> Rider find drivers nearby
- Driver <-> Driver save location
- Basic Structure: Both services talk to users
- Storage
- Data
- Dispatch Service = Dispatch Web Server + Trip Table Database (Read > Write)
- Geo Service = Geo Web Server + Location Table (Write > Read) + Driver Table
- Schema
- Trip Table: trip_id; driver_id; rider_id; latitude; longtitude; status; created_at;
- Location: driver_id; latitude; longtitude; updated_at;
- Query Location:
- Naive Solution: SQL query latitude and longtitude in ranges. Slow.
- Geo Service:
- Google S2.
- Hilbert Curve
- Map 2D space to 1D space. (Global space map to2^64 blocks)
- If two points are close in 2D, then they are close in 1D as well.
- Geohash
- Peano Curve (2 Way Division)
- Base32: 0-9, a-z (remove a, i, l, o)
- If two string has a common prefix then they are close.
- Details:
- With longer common prefix, the maximum distance is smaller.
- Calculate
- First cut Longitude Left 0, Right 1.
- Second, cut Latitude. Up 0, Down 1.
- Interchange.
- Every 5 steps, get 5 bits, get a base32 digit.
- Implementation
- Method
- SQL: Index GeoHash. Like query.
- NoSQL Cassandra: Column key GeoHash. Range Query.
- NoSQL Redis / Memcached: Store at 4,5,6 digits prefix. (600m – 20km). (Update car location and delete)
- Choice
- SQL: Okay but slow
- Like query is slow.
- Driver updated location. Index column changes. B+ tree changes. Low efficiency.
- NoSQL: Cassandra Okay but slow
- The column key is indexed. Shouldn’t be changed.
- NoSQL: Memcached Bad
- Not persistent
- Doesn’t support set
- NoSQL: Redis
- QPS > 100K
- Persistent
- Support list, set, etc. (Set of drivers)
- SQL: Okay but slow
- Redis Implementation
- Rider: Location -> GeoHash -> Nearby Drivers
- Location Table (Frequent write)
- Key: geohash
- Value: driver ids
- Driver Table (Frequent write)
- Key: driver_id
- Value: lat, log, status,updated_id, trip_id
- Location Table (Frequent write)
- Driver:
- Report Location: Calc Geohash -> Change in Location Table / Driver Table if Geohash changes
- Accept Booking: Change trip status, mark unusable in driver table
- Finish Trip: Change trip status. Mark usable.
- Rider: Location -> GeoHash -> Nearby Drivers
- Method
- Overall
- Tripe table (SQL)
- Query by trip_id, rider_id, driver_id, etc.
- Location table / Driver table (Redis)
- Tripe table (SQL)
- Google S2.
- Data
- Overall Solution
- The rider sends a request. The server creates a trip.
- Return trip id to the user.
- The rider queries the server if the match is successful periodically (or create socket).
- The server finds a matching driver. Write the driver id into the trip.
- Change the driver as unavailable.
- The driver reports his location.
- Find the driver table has an allocated trip id.
- Find the trip on the trip table and returns to the driver.
- Driver accepts.
- Change the driver table and the trip table.
- The rider gets the information of the matching driver.
- Driver declines.
- Change the driver table and the trip table.
- Redo 2.
- The rider sends a request. The server creates a trip.
- Scale
- DB Sharding
- Redis server is down
- Multiple Redis / Cassandra
- Purpose
- Share QPS
- Avoid single point failure
- Sharding Key
- City ID
- Deciding the City -> Geo-Fence
- Include the nearby cities
- Redis server is down
- Check if the rider is in airport
- Geo Fence
- Check which city and then check which airport.
- How to reduce the impact on database crash
- Replica
- Replica by Redies: Master and Slave
- Replica by yourself: Virtual sharding
- Replica by NoSQL: Riak / Cassandra
- Replica
- DB Sharding
Leave a comment