// blog.post

How We Reduced CRM Data Latency by 96%

Our sales and marketing teams were working with 24-hour-old data. A lead would sign up at 9 AM, and important engagement information or derived insights wouldn't be available until 9 AM the next day (sometimes longer or not at all). By then, the lead might have already moved on to a competitor or been marketed to in the wrong way.

We needed real-time CRM data. HubSpot offered a native reverse-ETL sync from Snowflake that looked perfect on paper. In practice, it broke immediately.

The problem? Cybrary uses UUIDs as the primary user identifier. HubSpot uses email addresses. When a user changed their email or used multiple, the algorithm used to detect changes would stop working. Updates could be missed and hard to detect. The native sync was fundamentally incompatible with our data model.

We had two options: pay for a reverse-ETL service like Hightouch, or build our own. We chose to build.

Here's how we did it, and why Snowflake's data share made it feasible.

Build vs Buy: The Decision

When the native sync failed, we evaluated our options:

Option A: Pay for Hightouch (or Census)

Option B: Build Custom

The decision hinged on one question: Could we actually build this quickly?

The answer was yes, thanks to HubSpot's Snowflake data share.

The Key Enabler: Snowflake Data Share

HubSpot provides a Snowflake data share that lets you query their CRM data directly from your Snowflake warehouse. This is a game-changer for building reverse-ETL.

Why it matters:

This meant we could offload the hardest part—finding which records need updating—to Snowflake. The custom code just needed to fetch results and submit them to HubSpot's API.

What We Built

The architecture is simple: Snowflake finds the deltas, Cloud Run submits them to HubSpot.

The Delta Query

This is where Snowflake data share shines. Instead of scanning all 1M+ records in application code, we let Snowflake do the comparison:

-- Find contacts that need updating
WITH our_data AS (
  SELECT
    user_id,
    email,
    phone,
    properties,
    updated_at
  FROM warehouse.contacts
  WHERE is_active = true
    AND updated_at >= CURRENT_TIMESTAMP - INTERVAL '20 minutes'
),
hubspot_data AS (
  SELECT
    contact_id,
    email,
    phone,
    properties
  FROM hubspot_share.contacts
)
SELECT o.*
FROM our_data o
LEFT JOIN hubspot_data h ON o.email = h.email
WHERE
  -- New contacts (not in HubSpot)
  h.contact_id IS NULL
  -- Or existing contacts with changes
  OR o.phone != h.phone
  OR o.properties != h.properties
ORDER BY o.updated_at DESC

This query runs entirely in Snowflake and returns only the records that need updating. On average, that's 100-200 contacts per 15-minute window (out of 1M+ total).

The Cloud Run job just fetches these results and submits them to HubSpot's batch API. Simple, fast, cheap.

The Cloud Run Job

The actual sync logic is ~200 lines of Python:

import snowflake.connector
import requests

def sync_contacts():
    """Sync changed contacts from Snowflake to HubSpot"""

    # 1. Query Snowflake for delta
    contacts = execute_snowflake_query(DELTA_QUERY)

    # 2. Transform for HubSpot API
    hubspot_batch = [
        {
            "email": c["email"],
            "properties": {
                "phone": normalize_phone(c["phone"]),
                "user_id": c["user_id"],
                # ... more fields
            }
        }
        for c in contacts
    ]

    # 3. Batch upsert to HubSpot
    response = requests.post(
        "https://api.hubspot.com/contacts/v1/contact/batch",
        json=hubspot_batch,
        headers={"Authorization": f"Bearer {HUBSPOT_API_KEY}"}
    )

    # 4. Log results for monitoring
    log_sync_results(response)

    return {"contacts_synced": len(contacts)}

That's it. The complexity is in the SQL query (which Snowflake handles), not the application code.

Results

96% latency reduction 90% cost savings Built in 3 days

Infrastructure cost breakdown: Cloud Run compute is minimal (~$1/month, scales to zero between syncs). The bulk of the cost is Snowflake compute for delta queries (~$20-30/month) and Cloud Composer for Airflow scheduling (~$20-30/month). Total: ~$40-60/month depending on query volume. I'll just use ~$50 as the estimate.

Business impact: Sales and marketing teams can now act on lead data within 15 minutes instead of 24+ hours. Response time is one of the strongest predictors of conversion.

Unexpected Benefits

Beyond speed and cost, building custom unlocked capabilities we hadn't anticipated:

1. Git-Based Version Control

All sync logic lives in version control. We can:

2. Easy to Add New Syncs

Want to sync a new dataset? Just add a SQL file and a config block. We went from 1 sync to 5 syncs in a month without touching the core pipeline code.

Each sync is just:

No vendor approval needed. No new connectors to purchase. Just commit and deploy.

3. Custom Logging and Alerting

We built exactly the observability we needed:

With vendor solutions, you get their logging. With custom, you get exactly what you need.

The Compounding Value of Custom

We started with one sync: contacts from Snowflake to HubSpot.

But once the architecture was proven, adding new syncs was trivial. Each new sync was just:

Over the next several months, we added:

By the time I left: 15 syncs running on the same infrastructure.

Hightouch cost trajectory: 2 syncs (free) → 10 syncs (Starter) → Enterprise pricing (need to talk to sales, estimated $3,000+/month)

This is the real power of custom: vendor costs scale linearly with features and/or usage, custom costs stay flatter for longer.

When to Build vs Buy

Not every company should build their reverse-ETL pipeline. Here's when it makes sense:

Build when:

  • You have Snowflake data share (or similar) access to your destination system
  • Native integrations don't fit your data model
  • You need custom transformation logic
  • You're syncing high volumes (vendor costs scale with rows)
  • You have 2-3 days of engineering time available

Buy when:

  • You don't have data share access (would need to build ETL first)
  • Native integration works fine for your use case
  • Engineering time is scarce
  • You're syncing to multiple destinations (vendor handles this)

The ROI Calculation

For us, the math was clear:

Plus the intangible benefits: full control, git-based versioning, custom observability, and the ability to add new syncs without vendor costs.

Key Takeaways

Three lessons from building this:

1. Data Shares Change the Build vs Buy Equation

Without Snowflake data share, this would've taken weeks (build ETL, then build sync logic). With it, we built in days. Data shares make custom integrations feasible for smaller teams.

2. Keep the Cloud Run Job Simple

All the complexity lives in SQL (which Snowflake is great at). The application code just orchestrates API calls. This makes it easy to test, debug, and maintain.

3. Start with One Sync, Then Scale

We built the contact sync first. Once that worked, adding new syncs (deals, companies, custom objects) was trivial. Start small, validate the architecture, then expand.

Conclusion

If you're facing a similar decision at your company, building custom might be faster and cheaper than you think.

Need help with your data infrastructure?

I help B2B SaaS companies optimize their data stacks through FinOps audits and hands-on implementation work.

Whether you're evaluating build vs buy decisions, migrating between systems, or optimizing existing pipelines, let's talk.