If you've ever opened a supplier file and thought "why can't they just send it in a normal format?" — you're not alone. Every product data team has had that moment. But here's the thing: suppliers aren't sending messy data on purpose. They're exporting from their systems, which were built for their needs, not yours.
Understanding why supplier data varies so widely is more useful than being frustrated by it. Once you see the patterns, you can stop treating each new supplier file as a surprise and start building a process that handles the variation by design.
Reason 1: Different Source Systems
Your suppliers don't all run the same software. One might export from SAP, another from a custom-built inventory system from 2009, and a third from a spreadsheet that someone on their team maintains by hand.
Each of these systems has its own internal data model, its own export format, and its own quirks. An ERP export might give you fixed-width columns with cryptic field codes. A handmade Excel file might have merged cells, color-coded rows, and notes in a "Comments" column. A legacy database export might truncate field names to eight characters because that was the limit in 1997.
None of these are wrong — they're just different. The supplier's system works fine for the supplier. The problem only appears when you need to combine data from all of them into a single, consistent catalog.
Reason 2: Regional Conventions
If you work with suppliers across countries, you've seen this firsthand. A German supplier sends Farbe where a French one sends Couleur and a US one sends Color. All three mean the same thing, but nothing in the files tells you that.
It goes deeper than language. Number formats differ: 1.299,00 in Germany means the same as 1,299.00 in the US. Date formats are a classic trap — is 03/04/2026 March 4th or April 3rd? Measurement units vary too: centimeters from European suppliers, inches from American ones, and occasionally a mix of both in the same file.
Even boolean values vary by region. You'll see Ja/Nein, Oui/Non, Y/N, true/false, 1/0 — all representing the same yes-or-no answer.
Reason 3: No Universal Product Data Standard
There is no single, universally adopted standard for how product data should be structured. Standards do exist — GS1, BMEcat, ETIM, UNSPSC — but adoption is fragmented. A supplier in industrial parts might use ETIM. A consumer goods supplier might follow GS1 conventions. A small furniture maker probably follows no standard at all.
Even within a single industry, you'll find suppliers at different levels of data maturity. Some have dedicated PIM systems with well-structured attribute sets. Others maintain everything in a single spreadsheet with free-text descriptions. Both are valid ways to run a business — they just produce very different data exports.
This isn't anyone's fault. Unlike financial data (which has decades of standardization behind it), product data is too varied across industries for a single schema to cover everything from screws to sofas to software licenses.
Reason 4: File Format Preferences
Beyond the data itself, the container it arrives in varies just as much:
- CSV files — but with different delimiters (comma, semicolon, tab), different encodings (UTF-8, Latin-1, Windows-1252), and different quoting rules
- Excel files — sometimes with data on the first sheet, sometimes on the third, sometimes split across multiple tabs with a "Legend" sheet explaining the codes
- XML feeds — with custom schemas, nested structures, and namespaces that vary by supplier
- JSON APIs — where the same data might be nested three levels deep or completely flat, depending on who built the endpoint
- PDF catalogs — yes, some suppliers still send product data embedded in formatted PDF documents, expecting you to extract it yourself
Each format brings its own parsing challenges. And when a supplier says "we'll send you a CSV," that tells you surprisingly little about what you'll actually receive.
Reason 5: Schema Evolution
Supplier data isn't static. Columns get added, renamed, or removed — often without notice. A supplier upgrades their ERP and suddenly product_color becomes item_colour. A new product line introduces attributes that didn't exist before. An old attribute disappears because the intern who maintained it left the company.
This kind of schema drift is one of the most frustrating sources of data inconsistency because it breaks processes that were working fine yesterday. Your import script ran perfectly for six months, and then one morning it fails because a column moved.
Suppliers rarely announce these changes in advance. They're often not even aware the change affects you, because on their end it was a minor system update.
What This Means for Your Team
The combined effect of these five factors leads to an uncomfortable truth: you cannot standardize what your suppliers send you. You can ask. You can send them templates. You can write specifications. Some suppliers will follow them. Many won't, or they'll follow them partially, or they'll follow them for a while and then drift back to their own format.
This isn't because suppliers are difficult. It's because conforming to your specific data format is a low priority for them. They have hundreds of customers, each with their own requirements. They're going to export from their system in the way that's easiest for them.
What you can control is what happens after the data arrives. You need a translation layer — something that sits between "raw supplier data" and "your internal schema" and handles the conversion reliably, every time.
Three Approaches to the Translation Problem
Approach 1: Manual Cleanup Every Time
This is where most teams start. A supplier sends a file. Someone on the team opens it, renames columns, fixes values, converts units, and saves a clean version. It works, and for one or two suppliers it's perfectly reasonable.
The problem is that this approach scales linearly with effort. Ten suppliers sending monthly updates means someone spends days every month doing repetitive cleanup. The knowledge of how to clean each supplier's data lives in one person's head. When that person is on vacation or leaves the company, the process breaks.
Approach 2: Custom Scripts Per Supplier
The natural next step is automation: write a Python script or a set of Excel macros that handles each supplier's format. This is a real improvement. The logic is codified, repeatable, and faster than manual work.
But it has its own failure modes. Scripts become unmaintainable as the number of suppliers grows. They're usually written by one developer who understands the quirks. When a supplier changes their format, the script breaks and needs debugging. Over time, you end up with a folder of fragile scripts, each with its own assumptions, and no one wants to touch them.
Approach 3: Supplier Adapters — Map Once, Apply Forever
The third approach treats supplier data mapping as a first-class concern rather than an afterthought. Instead of writing disposable scripts, you create a structured adapter for each supplier that defines: how their columns map to yours, how their values translate to your vocabulary, and what unit conversions apply.
Once the adapter is configured, it applies automatically every time that supplier sends new data. When something unexpected appears — a new column, an unknown value — it gets flagged for review rather than silently breaking or passing through raw.
This is the approach that scales. Not because it eliminates work entirely (you still need to set up each adapter and review edge cases), but because it makes the work cumulative. Every mapping you define makes future imports from that supplier faster and more reliable.
Building Your Translation Layer
Whichever approach you choose, the core ingredients are the same:
- Column mapping per supplier — a clear record of which supplier field corresponds to which internal attribute
- Value mapping per attribute — a translation table from supplier-specific values to your canonical vocabulary
- Unit and format rules — how to convert measurements, dates, and number formats from each supplier's conventions to yours
- Unknown value handling — a defined process for what happens when something new appears that your rules don't cover
This is what we built FeedPrep to do. You create a supplier adapter that captures these rules, and it applies them on every future feed from that supplier. It doesn't magically know your mappings on day one — you teach it your rules, and then it applies them consistently going forward. When new values appear that aren't covered by existing rules, they land in a review inbox rather than silently entering your catalog.
But whether you use FeedPrep or build your own solution, the principle matters more than the tool: stop fighting supplier formats and build a system that translates them.
A More Generous View of Supplier Data
Once you understand why supplier data varies, it's easier to work with it constructively. Your suppliers aren't the enemy. The lack of universal standards is. Suppliers are doing their best with their own systems, their own regional conventions, and their own priorities.
The teams that handle product data most effectively are the ones that accept this reality early and invest in a proper translation layer — rather than spending energy trying to get every supplier to conform to a template they'll inevitably drift away from.