Do you know your system’s data source classifications?

Knowing this can literally make or break your enterprise. I’m currently leading a massive system migration where the old system has:

  • Data sourced from multiple upstream systems
  • Highly operational data
  • New data sourced in this system not held anywhere else
  • A whole lot of “we think this is right” gray areas

Chaos will catch up to you if you don’t get this straight. Four classifications you must define clearly:

  • Source of Origin (SoO) — The initial entry point into your data ecosystem
  • Source of Record (SoR) — the authoritative system for maintaining that data
  • Source of Reference (SoRf) — non-authoritative copy used for operations or reporting
  • Source of Truth (SoT) — The trusted, definitive version after validation and enrichment.

Why does this matter in a large ecosystem? It will guide you in how data should be integrated, maintained, retained, and referenced for legal purposes. Here is what each classification tells you:

Source of Origin:

  • Helps trace data quality issues back to their root cause
  • Determines data ownership and accountability for accuracy

Source of Record:

  • Prevents conflicting updates across multiple systems
  • Establishes clear ownership for data maintenance and corrections
  • Reduces data inconsistencies and duplication
  • Defines where to write changes when updates are needed

Source of Reference:

  • Identifies which systems are consumers vs producers of data
  • Prevents accidental modifications to data in non-authoritative systems
  • Helps design proper data flow and integration patterns
  • Clarifies which systems can be rebuilt/refreshed from authoritative sources without data loss

Source of Truth:

  • Eliminates confusion about which version of data to trust
  • Enables consistent reporting and decision-making across the organization

While these are distinct classifications, some systems can be fit into many. For example, let’s assume your business takes orders online or over the phone. If your call center enters orders into a Point-of-Sale (PoS) application and you take orders through an eCommerce site, both PoS and eCommerce systems become a Source of Origin (SoO).

If both systems sync into your ERP, like NetSuite, and sync customer data to your CRM Salesforce, then Salesforce becomes your Source of Record (SoR) for customers and NetSuite becomes your SoR for payments. eCommerce and PoS becomes Source of Reference (SoRf) and SoO. If you then use Snowflake to goldenize and household your data, it becomes the Source of Truth (SoT).

If you treat your eCommerce system incorrectly as a SoR, you may miss orders in your PoS system when generating financial reports, leading to revenue discrepancies that auditors will flag. If you update a customer’s data in your PoS system but not your CRM, a refund business process that replenishes stock quantities may not run. If you don’t know your Sources of Origin, when a data quality issue surfaces—say, malformed phone numbers—you won’t know whether to fix your eCommerce validation, your PoS entry training, or both. You’ll waste time treating symptoms instead of causes. Without Snowflake as your established SoT, your finance team pulls from NetSuite while sales pulls from Salesforce. Now you’re in a meeting arguing about which customer count is ‘real’ instead of making decisions.

Accurately defining the data in each system can prevent you from making horrible data choices. It clearly points people to the right location, and communicates architecture limitations that should be put around each system. It removes the gray areas about where data originates, who owns it, and how it should be used. In my migration work, the teams who can answer “which system is SoR for X?” in under 5 seconds are the ones who ship on time. The ones who can’t are still arguing about it in year two.

Leave a Reply

Your email address will not be published. Required fields are marked *