If you work in e-commerce or product data management, you've dealt with normalization whether you called it that or not. It's what you're doing every time you change colour_name to color, convert 40 cm to 400 mm, or replace meadow with green.

This guide explains what product data normalization actually means in practice — not the database theory version, but the real-world version that matters when you're staring at a messy supplier CSV.

What Is Product Data Normalization?

Normalization is the process of transforming inconsistent data into a consistent, standardized format. In the context of product data, it means taking whatever your suppliers give you and turning it into something that matches your internal schema — your canonical set of attributes, values, and units.

The goal is simple: every product in your catalog should describe the same attribute the same way, regardless of which supplier provided it.

The Three Layers of Product Data Normalization

Normalization isn't one task. It's three distinct layers, each solving a different problem.

Layer 1: Column Mapping

Different suppliers use different names for the same thing. Column mapping is the process of translating supplier column names to your canonical attribute names.

The problem:

Supplier ASupplier BSupplier CYour Schema
colour_nameColorclrcolor
GewichtWeight (kg)wtweight
BreiteWidthw_cmwidth
ProduktnameProduct Titlenamename
Mat.Material Typematerialmaterial

Five suppliers might use five different names for color. Column mapping creates a translation layer: "when you see colour_name from Supplier A, that's our color attribute."

This is the most fundamental layer. Without it, your data can't even enter your system in the right shape.

Layer 2: Value Standardization

Even when columns are mapped correctly, the values inside them are often inconsistent. Value standardization translates supplier-specific values to your canonical vocabulary.

The problem:

AttributeRaw ValueNormalized Value
colormeadowgreen
colorMidnightblack
colorclr_029navy blue
materialEiche massivsolid oak
materialMDF/particle boardMDF
sizeXLargeXL
sizeextra-largeXL
booleanJatrue
booleanYtrue

Suppliers use their own terminology. Marketing names instead of standard names. Internal codes instead of human-readable values. Different languages. Different abbreviations. Value standardization creates a mapping from each supplier's vocabulary to yours.

This is especially critical for select and multiselect attributes (color, material, category) where inconsistent values create duplicate filter options in your store or break your PIM's controlled vocabularies.

Layer 3: Unit Conversion

Measurement attributes (weight, dimensions, volume) come in different units across suppliers. Unit conversion standardizes everything to your preferred unit system.

The problem:

AttributeRaw ValueNormalized Value
width40 cm400 mm
width15.7 inches399 mm
width0.4 m400 mm
weight2500 g2.5 kg
weight5.5 lbs2.49 kg
date03/24/20262026-03-24
date24.03.20262026-03-24
price1.299,001299.00

This layer also covers format conversions that aren't strictly "units" but behave the same way: date formats, decimal separators, currency formatting, text casing.

Why Manual Normalization Doesn't Scale

If you have 3 suppliers and do this quarterly, a spreadsheet works fine. But normalization becomes a serious problem when:

The core problem is that spreadsheets don't remember your normalization logic. Every month, when a supplier sends an updated file, you start the same process from scratch. The VLOOKUP formulas, the find-and-replace sequences, the manual column reordering — none of it persists in a reusable, shareable way.

What Good Normalization Looks Like: Before and After

Here's a real-world example. Three suppliers send product data for the same type of product (a dining chair). Here's what the raw data looks like side by side:

Before normalization:

Supplier A (German)Supplier B (English)Supplier C (Internal codes)
Product nameProduktname: Esszimmerstuhl "Oslo"Product Title: Oslo Dining Chairname: CHAIR-OSL-BLK
ColorFarbe: SchwarzColor: Midnightclr: BLK
MaterialMat.: Eiche massivMaterial Type: Solid Oak Woodmaterial: OAK-S
WidthBreite: 45 cmWidth: 17.7 inchesw_cm: 450
WeightGewicht: 8500 gWeight (kg): 8.5wt: 18.7
PricePreis: 299,00Price: 299.00price: 29900

After normalization:

AttributeSupplier ASupplier BSupplier C
nameOslo Dining ChairOslo Dining ChairOslo Dining Chair
colorblackblackblack
materialsolid oaksolid oaksolid oak
width450 mm450 mm450 mm
weight8.5 kg8.5 kg8.5 kg
price299.00299.00299.00

Same product, same data, same format — regardless of which supplier it came from. That's what normalization achieves.

Making Normalization Repeatable

The key insight is that normalization rules are specific to each supplier but stable over time. Supplier A will keep calling it Farbe. Supplier C will keep using internal codes. These patterns don't change often.

This means if you can save your normalization rules per supplier, you only need to do the work once. The second time that supplier sends a file, the same rules apply automatically. New values get flagged for review; known values flow through untouched.

This is the "map it once, reuse forever" pattern. Whether you implement it with a dedicated tool, a well-structured script, or a very disciplined spreadsheet process, the principle is the same: capture normalization logic once and apply it automatically.

Common Normalization Pitfalls

1. Normalizing too aggressively

Not every difference needs to be normalized. If a supplier calls a product "Oslo Dining Chair" and you prefer "Dining Chair Oslo," that's a style preference, not a data quality issue. Normalize for consistency and correctness, not for perfection.

2. Forgetting about new values

Suppliers add new products with new attribute values. If your normalization process only handles known values and silently passes through unknowns, you end up with a mix of normalized and raw data. Unknown values should be flagged, not ignored.

3. One-size-fits-all rules

A value mapping that works for Supplier A might be wrong for Supplier B. Large might mean L for apparel but a specific measurement range for furniture. Keep rules scoped to the right level — per supplier when meanings differ, workspace-wide when they're universal.

4. No validation after normalization

Normalization can introduce its own errors: a wrong unit conversion formula, a value mapping that catches too broadly, a column mapped to the wrong attribute. Always review normalized output, especially the first time a rule runs.

Getting Started

If you're doing normalization manually today, start by documenting your three layers:

  1. Column map: For each supplier, list their column names and what they map to in your schema.
  2. Value map: For select/multiselect attributes, list the supplier values and your canonical equivalents.
  3. Unit rules: For measurement attributes, document the source unit and target unit per supplier.

Just documenting this will reveal how much implicit knowledge exists in your team's heads. That's the knowledge that needs to be in a system — whether that's a tool like FeedPrep, a well-maintained configuration file, or at minimum a shared document your whole team can reference.

Normalize Supplier Data Without the Spreadsheet

FeedPrep saves your normalization rules per supplier and applies them automatically on every future feed.

Start Free Trial