Stay humble. Stay hungry. Stay foolish.

Design Location-Based Service

Written in

by

How to design Uber

  1. Uber Technique
    1. RingPop:  Web server and database at the same location.
    2. TChannel: RPC protocol
    3. Google S2: Location-based storage and query
    4. Riak: Open source dynamo DB
  2. Load Balance (webserver)
    1. No need to use consistent hashing.
      1. Random.
      2. Based on the webserver’s workload.
    2. Implementation
      1. Hardware
      2. Software Nginx
  3. Scenario
    1. Features
      1.  Basic
        1. Driver report locations: Heat beat. Report per 4 seconds.
        2. Rider request Uber, match a driver with the rider
      2. Improve:
        1. Denny / Accept
        2. Cancel
        3. Pick up / Drop off
      3. Add-On
        1. Pool / Eats
    2. Requirement
      1. Ask Drivers / Riders / Area
      2. Driver: Assume 200K. Avg QPS 200K / 4 = 50K. Peak QPS 50K * 3 = 150K. (DB Write)
      3. Rider: Ignorable.
  4. Service
    1. Basic Structure: Both services talk to users
      1. Driver <-> Geo Service
        |
      2. Rider <-> Dispatch Service
    2. Improved Structure: Only dispatch service talks to users
      1. Driver <->                 Driver save location
        Dispatch Service <-> Geo Service
      2. Rider <->               Rider find drivers nearby
  5. Storage
    1. Data
      1. Dispatch Service = Dispatch Web Server + Trip Table Database (Read > Write)
      2. Geo Service = Geo Web Server + Location Table (Write > Read) + Driver Table
    2. Schema
      1. Trip Table: trip_id; driver_id; rider_id; latitude; longtitude; status; created_at;
      2. Location: driver_id; latitude; longtitude; updated_at;
    3. Query Location:
      1. Naive Solution: SQL query latitude and longtitude in ranges. Slow.
      2. Geo Service:
        1. Google S2.
          1. Hilbert Curve 
          2. Map 2D space to 1D space. (Global space map to2^64 blocks)
          3. If two points are close in 2D, then they are close in 1D as well.
        2. Geohash
          1. Peano Curve (2 Way Division)
          2. Base32: 0-9, a-z (remove a, i, l, o)
          3. If two string has a common prefix then they are close.
          4. Details:
            1. With longer common prefix, the maximum distance is smaller.
            2. Calculate
              1. First cut Longitude Left 0, Right 1.
              2. Second, cut Latitude. Up 0, Down 1.
              3. Interchange.
              4. Every 5 steps, get 5 bits, get a base32 digit.
          5. Implementation
            1. Method
              1. SQL: Index GeoHash. Like query.
              2. NoSQL Cassandra: Column key GeoHash. Range Query.
              3. NoSQL Redis / Memcached: Store at 4,5,6 digits prefix. (600m – 20km). (Update car location and delete)
            2. Choice
              1. SQL: Okay but slow
                1. Like query is slow.
                2. Driver updated location. Index column changes. B+ tree changes. Low efficiency.
              2. NoSQL: Cassandra Okay but slow
                1. The column key is indexed. Shouldn’t be changed.
              3. NoSQL: Memcached Bad
                1. Not persistent
                2. Doesn’t support set
              4. NoSQL: Redis
                1. QPS > 100K 
                2. Persistent
                3. Support list, set, etc. (Set of drivers)
            3. Redis Implementation
              1. Rider: Location -> GeoHash -> Nearby Drivers
                1. Location Table (Frequent write)
                  1. Key: geohash
                  2. Value: driver ids
                2. Driver Table (Frequent write)
                  1. Key: driver_id
                  2. Value: lat, log, status,updated_id, trip_id
              2. Driver:
                1. Report Location: Calc Geohash -> Change in Location Table / Driver Table if Geohash changes
                2. Accept Booking: Change trip status, mark unusable in driver table
                3. Finish Trip: Change trip status. Mark usable.
        3. Overall
          1. Tripe table (SQL)
            1. Query by trip_id, rider_id, driver_id, etc.
          2. Location table / Driver table (Redis)
  6. Overall Solution
    1. The rider sends a request. The server creates a trip.
      1. Return trip id to the user.
      2. The rider queries the server if the match is successful periodically (or create socket).
    2. The server finds a matching driver. Write the driver id into the trip.
      1. Change the driver as unavailable.
    3. The driver reports his location.
      1. Find the driver table has an allocated trip id.
      2. Find the trip on the trip table and returns to the driver.
    4. Driver accepts.
      1. Change the driver table and the trip table.
      2. The rider gets the information of the matching driver.
    5. Driver declines.
      1. Change the driver table and the trip table.
      2. Redo 2.
  7. Scale
    1. DB Sharding
      1. Redis server is down
        1. Multiple Redis / Cassandra
      2. Purpose
        1. Share QPS
        2. Avoid single point failure
      3. Sharding Key
        1. City ID
        2. Deciding the City -> Geo-Fence
        3. Include the nearby cities
    2. Check if the rider is in airport
      1. Geo Fence
      2. Check which city and then check which airport.
    3. How to reduce the impact on database crash
      1. Replica
        1. Replica by Redies: Master and Slave
        2. Replica by yourself: Virtual sharding
        3. Replica by NoSQL: Riak / Cassandra

Tags

Leave a comment