Bad data doesn't announce itself. There's no alert that fires when a supplier drops the weight field from half their products, or when someone enters a price of €0.00, or when 200 SKUs quietly vanish between one feed and the next.
Instead, it sneaks in. It passes through your import pipeline, lands in your PIM or store, and causes problems downstream: broken shipping calculations, duplicate filter options, wrong prices on the storefront, missing product images. By the time someone notices, the damage is already customer-facing.
The fix is straightforward: check the data before it enters your system. Here are five checks that catch the vast majority of supplier data issues. You can run them manually with a script, build them into a spreadsheet, or use a dedicated tool — the important thing is that they happen on every feed, every time.
Check 1: Completeness
The most basic question: are the required fields actually filled in?
This sounds obvious, but supplier feeds are rarely 100% complete. The issue is that partial data can look fine at a glance — you see rows and columns and values — until you realize that 60% of products are missing a critical attribute.
A completeness check calculates the fill rate for every attribute across the entire feed. Here's what a completeness report might look like for a single supplier:
| Attribute | Fill Rate | Status |
|---|---|---|
| product_name | 100% | OK |
| price | 95% | OK |
| ean | 88% | Warning |
| image_url | 72% | Warning |
| weight | 40% | Fail |
| category | 100% | OK |
| description | 63% | Warning |
In this example, 95% of products have a price (the remaining 5% need investigation), but only 40% have a weight. If your store uses weight for shipping calculations, that means 60% of products will either show wrong shipping costs or fail at checkout. That's not a minor data gap — it's a broken customer experience.
What to define up front: Which attributes are required, which are recommended, and what fill-rate threshold constitutes a pass. A feed with 100% of names but 40% of weights might be acceptable for a draft import but not for a production push.
Check 2: Format Consistency
Even when values are present, they might not be in the right format. This is especially common with numeric fields, dates, and any attribute where the supplier's system has loose input validation.
Consider a price field across three suppliers:
| Supplier | Raw Value | Problem |
|---|---|---|
| Supplier A | €29.99 | Currency symbol embedded in value |
| Supplier B | 29,99 | Comma as decimal separator |
| Supplier C | 29.99 EUR | Currency suffix appended |
| Supplier D | 2999 | Value in cents, no decimal |
All four represent the same price, but if you import them into a numeric field without checking, you'll get errors or — worse — silently wrong data. 2999 treated as euros instead of cents means a product listed at nearly three thousand euros.
Format consistency checks look for:
- Numeric fields that contain non-numeric characters (currency symbols, unit suffixes, stray text)
- Date fields with mixed formats (
03/24/2026vs.24.03.2026vs.2026-03-24) - Boolean fields with inconsistent representations (
yes,Y,1,true,Ja) - EAN/GTIN codes with wrong lengths or non-numeric characters
- URL fields that aren't valid URLs (missing protocol, spaces, relative paths)
The key insight: format problems are usually consistent per supplier. Supplier B will always use comma decimals. Once you identify the pattern, you can write a transformation rule. But first you have to catch it.
Check 3: Value Validity
A value can be present and correctly formatted and still be wrong. Validity checks test whether values fall within reasonable, expected ranges.
Examples of invalid-but-well-formatted values:
- Weight: 0.001 kg — technically valid, but almost certainly a data entry error. Very few products weigh one gram.
- Weight: 9999 kg — a common placeholder value that should have been replaced with real data.
- Price: €0.00 — a free product, or a missing price disguised as a zero?
- Price: €999,999.99 — probably a placeholder or test value.
- Width: 3 mm — for a piece of furniture? Likely a unit error (should be 3 m or 300 cm).
- Year: 2099 — in a "year of manufacture" field, this is clearly wrong.
Validity rules are specific to the attribute and often to the product category. A weight of 0.01 kg is suspicious for furniture but perfectly normal for jewelry. A price of €5,000 is unusual for office supplies but expected for electronics.
Practical approach: Start with wide boundaries and tighten them over time. Flag outliers for manual review rather than rejecting them outright — sometimes the weird values are correct, and you'll learn which ranges actually matter for your catalog.
Check 4: Referential Integrity
This check answers the question: do the values in this feed match my controlled vocabulary?
Every e-commerce system has lists of allowed values for certain attributes — categories, brands, colors, materials, sizes. When a supplier sends a value that isn't in your list, it creates a mismatch. The value might not be wrong per se, but it doesn't match your system.
| Attribute | Supplier Value | Your Value | What Happens |
|---|---|---|---|
| color | Navy Blue | Dark Blue | Duplicate filter option in store |
| brand | ACME Corp. | ACME | Same brand listed twice |
| category | Sofas & Couches | Sofas | Products in wrong/extra category |
| material | Genuine Leather | Leather | Duplicate material filter |
| size | Extra Large | XL | Confusing size selector |
The consequence is usually duplicate filter options. Your store shows both "Navy Blue" and "Dark Blue" as separate colors, splitting inventory across two filters. Customers searching for blue products miss half of them.
A referential integrity check compares every categorical value in the feed against your controlled vocabulary and flags anything that doesn't match. The fix is usually a value mapping rule ("when supplier sends Navy Blue, map to Dark Blue") — but you have to detect the mismatch first.
Watch out for: New values that are legitimate additions to your vocabulary. Not every mismatch is wrong — sometimes the supplier is sending a genuinely new color or material you need to add to your system. The check should flag these for human review, not auto-reject them.
Check 5: Feed-Over-Feed Consistency
The first four checks look at a single feed in isolation. This one compares the current feed to the previous one. It catches a different class of problems: things that changed when they probably shouldn't have, or changed more dramatically than expected.
What to compare between feeds:
- Product count delta: The last feed had 1,200 products, this one has 980. Where did 220 products go? Was that intentional (end of season) or an error (truncated export)?
- Price changes: 50 products had price increases above 20%. Bulk pricing error, or legitimate repricing?
- New attribute patterns: A column that was always filled is suddenly empty for 30% of rows. Did the supplier change their export template?
- Schema changes: The feed has two new columns and one column is missing compared to last time. Was this communicated?
Feed-over-feed comparison is the check most teams skip because it requires keeping historical data. But it catches the most dangerous class of errors: the ones where each individual row looks fine, but the feed as a whole is wrong.
Real-world example: A supplier accidentally exports only their "active" products instead of the full catalog. The feed looks perfectly clean — all values present, all formats correct. But 200 products silently disappeared. Without a feed-over-feed comparison, you wouldn't notice until a customer tries to order something that's no longer in your system.
Building a Quality Gate
The important thing about these five checks is when they run. If you check data after it's already in your PIM or store, you're doing damage control. The data is live, customers are seeing it, and you're racing to fix problems that are already causing harm.
A quality gate runs these checks before the import. The data sits in a staging area while the checks run. If it passes, it flows into your system automatically. If it fails, it gets held for review.
A practical quality gate has three tiers:
- Pass: All checks within acceptable thresholds. Data imports automatically.
- Warning: Some checks flagged minor issues (e.g., a few new category values, slight completeness drop). Data imports, but someone reviews the flags within 24 hours.
- Fail: Critical issues detected (e.g., product count dropped 20%, required fields missing at scale, prices changed dramatically). Import is blocked until a human reviews and approves.
The thresholds are yours to define and will evolve over time. Start permissive and tighten as you learn which failures are noise and which ones matter.
You can implement this with a script that runs on each incoming file, a cron job that processes a staging folder, or a tool built for the purpose. FeedPrep, for example, runs feed health reports on every incoming supplier feed and flags issues before data enters your workflow. But the principle is the same regardless of implementation: check first, import second.
Start With One Check, Then Stack
You don't need all five checks on day one. If you're doing nothing today, start with completeness — it's the easiest to implement and catches the most common problems. A simple script that counts non-empty values per column and flags anything below your threshold will catch more issues than you expect.
Once completeness is working, add format consistency. Then validity ranges. Then referential integrity. Feed-over-feed comparison comes last because it requires the most infrastructure (you need to store and compare historical feeds), but it's also the one that catches the most subtle and dangerous problems.
The goal isn't perfection. It's a repeatable process that catches the obvious problems before they become customer-facing. Five checks, run consistently, will prevent the vast majority of data quality incidents.