Waypoint Duration Estimate Project πŸš΄πŸ“¦

Introduction

  • Project Duration: ~4 months
  • Team:
    • 1 Backend Engineer (me)
    • 1 Data Scientist
    • 2 Mobile Engineers (Android & iOS)
  • Goal: Accurately estimate courier waypoint durations (pickups/dropoffs) across diverse conditions (vehicle type, geography, time of day, etc.)

Defining the Problem

What is a Waypoint?

A pickup or drop-off location in a delivery process. No other types of waypoints were considered.

Why are Waypoint Duration Estimates Critical?

  • They determine which courier is best suited for an assignment
  • If we underestimate the duration β†’ new jobs get assigned too soon
  • If we overestimate the duration β†’ couriers sit idle, slowing down deliveries

Example Scenario:

Courier A (10min away)
Courier B (finishing nearby delivery)

The Baseline Approach

What existed before?

  • Pickup durations: A simple lookup table with a few hardcoded values (e.g. fast food vs deep dish pizza)
  • Drop-off durations: A fixed value nationwide

Funny Example 🀯

A suburban house and a 50-story NYC high-rise both had a 5-minute drop-off estimate.

Reality? NYC deliveries take way longer!

Suburban House
🏠
🚲 ➑️ πŸ“¦ ➑️ πŸšͺ

Quick & straightforward

~2-3 minutes

NYC High-rise
🏒
🚲 ➑️ πŸšͺ ➑️ πŸ›— ➑️ πŸ“¦

Multiple steps & waiting

~8-12 minutes

Phase 1: Collecting Ground Truth Data

Goal: Measure real-world waypoint durations using GPS data.

How We Did It:

  • πŸͺ¨ Captured GPS coordinates from couriers every 15 seconds
  • βœ… Filtered GPS coordinates to remove noise
  • βœ… Calculated duration within X meters of waypoints for every delivery

Challenges:

  • GPS noise – Surprised to find out how noisy this data can be even on modern phones
  • Processing scale – Needed low overhead way to process very large amounts of data; Apache Beam (Dataflow)
  • Incorrect data - Some waypoints that had incorrect GPS coordinates but couriers were figuring it out; updated data upstream

Phase 2: Impact early

Goal: Quickly deliver value from our Phase 1 data collection without over-engineering.

What Changed?

  • Updated pickup estimates using fresher data
  • Moved to per-city, per-vehicle type dropoff estimates
  • Added alerting on incorrect pickup waypoint locations (e.g. completed deliveries where couriers were never at the pickup location)

Phase 3: ML Model

Goal: Use machine learning to make predictions more accurate.

Key Features Used in the Model:

  • Courier's vehicle type (bike, car, scooter)
  • City & Grid Location (S2 grid cells)
  • Time of Day (rush hour vs. late night)
  • Various Pickup specific features (e.g. business type, order size, etc.)

Model Training & Deployment:

  • Implemented a Random Forest model in TensorFlow, optimized for Mean Absolute Error (MAE) due to outliers
  • Deployed via TensorFlow Serving in Kubernetes
  • Added version control (S3 Bucket) & batch processing (gRPC)

Impact: Significant reduction in mean absolute error and allowed matching/routing logic to be more aggressive

Phase 4: "Fences" API

Problem: Even with a highly accurate duration estimate, we had a problem identifying the actual moment couriers entered waypoints.

Solution: Client device (iOS/Android) driven evaluation of waypoint entry through a geofence-esque API

How It Worked:

βœ… Client-Side Processing:

  • πŸ“±

    Realtime

    Direct access to GPS, accelerometer, and other sensors without additional battery drain from streaming

  • πŸ“Š

    Complete Sensor Dataset

    Process raw sensor data locally instead of limiting what we stream to servers

  • ⚑

    Reduced Backend Load

    Reduce backend load by processing high-frequency events on device

Sample conditions:

  • πŸ“ Distance from latitude/longitude
  • ⏳ Time spent near within X meters of latitude/longitude
  • πŸšΆβ€β™‚οΈπŸš— Motion activity (walking, stationary, driving)

Example API Logic:

location_condition AND (motion_walking OR motion_stationary)

Where:

  • location_condition = within 100m for 30+ sec
  • motion_walking = accelerometer detects walking

Why This Helped:

  • Allowed more precise timing of waypoint entry
  • Enabled easy experimentation with different conditions
  • Reduced backend detection of "Courier Arrived" events
  • Foundation for future use-cases (e.g. merchant notifications)

Conclusion

What We Achieved:

  • βœ… Reduced uncertainty β†’ Established well defined metrics for waypoint duration that were then optimized against
  • βœ… Progress through iteration β†’ Improved through multiple iterations; some were short term oriented and others were long-term
  • βœ… Established foundations for future work β†’ ML infrastructure and Fences API were immediately used for other projects