Waypoint Duration Estimate Project 🚴📦

Introduction

Project Duration: ~4 months
Team:
- 1 Backend Engineer (me)
- 1 Data Scientist
- 2 Mobile Engineers (Android & iOS)
Goal: Accurately estimate courier waypoint durations (pickups/dropoffs) across diverse conditions (vehicle type, geography, time of day, etc.)

Defining the Problem

What is a Waypoint?

A pickup or drop-off location in a delivery process. No other types of waypoints were considered.

Why are Waypoint Duration Estimates Critical?

They determine which courier is best suited for an assignment
If we underestimate the duration → new jobs get assigned too soon
If we overestimate the duration → couriers sit idle, slowing down deliveries

Example Scenario:

Courier A (10min away)

Courier B (finishing nearby delivery)

The Baseline Approach

What existed before?

Pickup durations: A simple lookup table with a few hardcoded values (e.g. fast food vs deep dish pizza)
Drop-off durations: A fixed value nationwide

Funny Example 🤯

A suburban house and a 50-story NYC high-rise both had a 5-minute drop-off estimate.

Reality? NYC deliveries take way longer!

Suburban House

🏠

🚲 ➡️ 📦 ➡️ 🚪

Quick & straightforward

~2-3 minutes

NYC High-rise

🏢

🚲 ➡️ 🚪 ➡️ 🛗 ➡️ 📦

Multiple steps & waiting

~8-12 minutes

Phase 1: Collecting Ground Truth Data

Goal: Measure real-world waypoint durations using GPS data.

How We Did It:

🪨 Captured GPS coordinates from couriers every 15 seconds
✅ Filtered GPS coordinates to remove noise
✅ Calculated duration within X meters of waypoints for every delivery

Challenges:

GPS noise – Surprised to find out how noisy this data can be even on modern phones
Processing scale – Needed low overhead way to process very large amounts of data; Apache Beam (Dataflow)
Incorrect data - Some waypoints that had incorrect GPS coordinates but couriers were figuring it out; updated data upstream

Phase 2: Impact early

Goal: Quickly deliver value from our Phase 1 data collection without over-engineering.

What Changed?

Updated pickup estimates using fresher data
Moved to per-city, per-vehicle type dropoff estimates
Added alerting on incorrect pickup waypoint locations (e.g. completed deliveries where couriers were never at the pickup location)

Phase 3: ML Model

Goal: Use machine learning to make predictions more accurate.

Key Features Used in the Model:

Courier's vehicle type (bike, car, scooter)
City & Grid Location (S2 grid cells)
Time of Day (rush hour vs. late night)
Various Pickup specific features (e.g. business type, order size, etc.)

Model Training & Deployment:

Implemented a Random Forest model in TensorFlow, optimized for Mean Absolute Error (MAE) due to outliers
Deployed via TensorFlow Serving in Kubernetes
Added version control (S3 Bucket) & batch processing (gRPC)

Impact: Significant reduction in mean absolute error and allowed matching/routing logic to be more aggressive

Phase 4: "Fences" API

Problem: Even with a highly accurate duration estimate, we had a problem identifying the actual moment couriers entered waypoints.

Solution: Client device (iOS/Android) driven evaluation of waypoint entry through a geofence-esque API

How It Worked:

✅ Client-Side Processing:

📱

Realtime

Direct access to GPS, accelerometer, and other sensors without additional battery drain from streaming
📊

Complete Sensor Dataset

Process raw sensor data locally instead of limiting what we stream to servers
⚡

Reduced Backend Load

Reduce backend load by processing high-frequency events on device

Sample conditions:

📍 Distance from latitude/longitude
⏳ Time spent near within X meters of latitude/longitude
🚶‍♂️🚗 Motion activity (walking, stationary, driving)

Example API Logic:


                            location_condition AND (motion_walking OR motion_stationary)

Where:

location_condition = within 100m for 30+ sec
motion_walking = accelerometer detects walking

Why This Helped:

Allowed more precise timing of waypoint entry
Enabled easy experimentation with different conditions
Reduced backend detection of "Courier Arrived" events
Foundation for future use-cases (e.g. merchant notifications)

Conclusion

What We Achieved:

✅ Reduced uncertainty → Established well defined metrics for waypoint duration that were then optimized against
✅ Progress through iteration → Improved through multiple iterations; some were short term oriented and others were long-term
✅ Established foundations for future work → ML infrastructure and Fences API were immediately used for other projects