Bad data doesn't announce itself. There's no alert that fires when a supplier drops the weight field from half their products, or when someone enters a price of €0.00, or when 200 SKUs quietly vanish between one feed and the next.

Instead, it sneaks in. It passes through your import pipeline, lands in your PIM or store, and causes problems downstream: broken shipping calculations, duplicate filter options, wrong prices on the storefront, missing product images. By the time someone notices, the damage is already customer-facing.

The fix is straightforward: check the data before it enters your system. Here are five checks that catch the vast majority of supplier data issues. You can run them manually with a script, build them into a spreadsheet, or use a dedicated tool — the important thing is that they happen on every feed, every time.

Check 1: Completeness

The most basic question: are the required fields actually filled in?

This sounds obvious, but supplier feeds are rarely 100% complete. The issue is that partial data can look fine at a glance — you see rows and columns and values — until you realize that 60% of products are missing a critical attribute.

A completeness check calculates the fill rate for every attribute across the entire feed. Here's what a completeness report might look like for a single supplier:

AttributeFill RateStatus
product_name100%OK
price95%OK
ean88%Warning
image_url72%Warning
weight40%Fail
category100%OK
description63%Warning

In this example, 95% of products have a price (the remaining 5% need investigation), but only 40% have a weight. If your store uses weight for shipping calculations, that means 60% of products will either show wrong shipping costs or fail at checkout. That's not a minor data gap — it's a broken customer experience.

What to define up front: Which attributes are required, which are recommended, and what fill-rate threshold constitutes a pass. A feed with 100% of names but 40% of weights might be acceptable for a draft import but not for a production push.

Check 2: Format Consistency

Even when values are present, they might not be in the right format. This is especially common with numeric fields, dates, and any attribute where the supplier's system has loose input validation.

Consider a price field across three suppliers:

SupplierRaw ValueProblem
Supplier A€29.99Currency symbol embedded in value
Supplier B29,99Comma as decimal separator
Supplier C29.99 EURCurrency suffix appended
Supplier D2999Value in cents, no decimal

All four represent the same price, but if you import them into a numeric field without checking, you'll get errors or — worse — silently wrong data. 2999 treated as euros instead of cents means a product listed at nearly three thousand euros.

Format consistency checks look for:

The key insight: format problems are usually consistent per supplier. Supplier B will always use comma decimals. Once you identify the pattern, you can write a transformation rule. But first you have to catch it.

Check 3: Value Validity

A value can be present and correctly formatted and still be wrong. Validity checks test whether values fall within reasonable, expected ranges.

Examples of invalid-but-well-formatted values:

Validity rules are specific to the attribute and often to the product category. A weight of 0.01 kg is suspicious for furniture but perfectly normal for jewelry. A price of €5,000 is unusual for office supplies but expected for electronics.

Practical approach: Start with wide boundaries and tighten them over time. Flag outliers for manual review rather than rejecting them outright — sometimes the weird values are correct, and you'll learn which ranges actually matter for your catalog.

Check 4: Referential Integrity

This check answers the question: do the values in this feed match my controlled vocabulary?

Every e-commerce system has lists of allowed values for certain attributes — categories, brands, colors, materials, sizes. When a supplier sends a value that isn't in your list, it creates a mismatch. The value might not be wrong per se, but it doesn't match your system.

AttributeSupplier ValueYour ValueWhat Happens
colorNavy BlueDark BlueDuplicate filter option in store
brandACME Corp.ACMESame brand listed twice
categorySofas & CouchesSofasProducts in wrong/extra category
materialGenuine LeatherLeatherDuplicate material filter
sizeExtra LargeXLConfusing size selector

The consequence is usually duplicate filter options. Your store shows both "Navy Blue" and "Dark Blue" as separate colors, splitting inventory across two filters. Customers searching for blue products miss half of them.

A referential integrity check compares every categorical value in the feed against your controlled vocabulary and flags anything that doesn't match. The fix is usually a value mapping rule ("when supplier sends Navy Blue, map to Dark Blue") — but you have to detect the mismatch first.

Watch out for: New values that are legitimate additions to your vocabulary. Not every mismatch is wrong — sometimes the supplier is sending a genuinely new color or material you need to add to your system. The check should flag these for human review, not auto-reject them.

Check 5: Feed-Over-Feed Consistency

The first four checks look at a single feed in isolation. This one compares the current feed to the previous one. It catches a different class of problems: things that changed when they probably shouldn't have, or changed more dramatically than expected.

What to compare between feeds:

Feed-over-feed comparison is the check most teams skip because it requires keeping historical data. But it catches the most dangerous class of errors: the ones where each individual row looks fine, but the feed as a whole is wrong.

Real-world example: A supplier accidentally exports only their "active" products instead of the full catalog. The feed looks perfectly clean — all values present, all formats correct. But 200 products silently disappeared. Without a feed-over-feed comparison, you wouldn't notice until a customer tries to order something that's no longer in your system.

Building a Quality Gate

The important thing about these five checks is when they run. If you check data after it's already in your PIM or store, you're doing damage control. The data is live, customers are seeing it, and you're racing to fix problems that are already causing harm.

A quality gate runs these checks before the import. The data sits in a staging area while the checks run. If it passes, it flows into your system automatically. If it fails, it gets held for review.

A practical quality gate has three tiers:

  1. Pass: All checks within acceptable thresholds. Data imports automatically.
  2. Warning: Some checks flagged minor issues (e.g., a few new category values, slight completeness drop). Data imports, but someone reviews the flags within 24 hours.
  3. Fail: Critical issues detected (e.g., product count dropped 20%, required fields missing at scale, prices changed dramatically). Import is blocked until a human reviews and approves.

The thresholds are yours to define and will evolve over time. Start permissive and tighten as you learn which failures are noise and which ones matter.

You can implement this with a script that runs on each incoming file, a cron job that processes a staging folder, or a tool built for the purpose. FeedPrep, for example, runs feed health reports on every incoming supplier feed and flags issues before data enters your workflow. But the principle is the same regardless of implementation: check first, import second.

Start With One Check, Then Stack

You don't need all five checks on day one. If you're doing nothing today, start with completeness — it's the easiest to implement and catches the most common problems. A simple script that counts non-empty values per column and flags anything below your threshold will catch more issues than you expect.

Once completeness is working, add format consistency. Then validity ranges. Then referential integrity. Feed-over-feed comparison comes last because it requires the most infrastructure (you need to store and compare historical feeds), but it's also the one that catches the most subtle and dangerous problems.

The goal isn't perfection. It's a repeatable process that catches the obvious problems before they become customer-facing. Five checks, run consistently, will prevent the vast majority of data quality incidents.

Catch Data Issues Before They Go Live

FeedPrep runs automated quality checks on every supplier feed and flags problems before import.

Start Free Trial