Understanding the Normalization Pipeline

When FeedPrep normalizes your product data, it runs each value through a 13-step pipeline in a fixed order. Every step has a specific job, and the order matters — earlier steps prepare the data for later ones. Here's exactly what happens at each stage.

The 13 Steps

1. Whitespace Cleanup

The first step trims leading and trailing whitespace from every value, collapses multiple consecutive spaces into a single space, and converts empty or whitespace-only strings to null. This ensures that no downstream step is tripped up by invisible formatting differences.

2. HTML Stripping

Removes HTML tags from values, leaving only the text content. This is important for fields where suppliers include markup (often in descriptions or titles). If the adapter has the keep_html setting enabled, this step is skipped for that adapter's feeds, preserving the original HTML.

3. Column Mapping

Maps the supplier's raw column names to your canonical attribute names using the adapter's column mapping rules. For example, colour_name becomes color and Gewicht becomes weight. If both the adapter and the feed have column mapping rules, feed-level rules take priority and override the adapter.

4. Unit Extraction

For measurement fields, this step extracts the numeric value and the unit from a combined string using pattern matching. A value like 40 cm is parsed into the number 40 and the unit cm. This separation is necessary before unit conversion can happen in the next step.

5. Measurement Normalization

Converts extracted measurements to your preferred target unit. Supported unit families include:

Length: mm, cm, m, km, in, ft
Weight: g, kg, lb, oz
Volume: ml, cl, l

For example, if your target unit for length is millimeters, 40 cm becomes 400 mm and 15.7 in becomes 399 mm.

6. Date Normalization

Auto-detects the format of date values from 11 recognized patterns and converts them to a configurable output format. The default output format is Y-m-d (e.g., 2026-03-30). This handles the common problem of suppliers using different date conventions — 03/30/2026, 30.03.2026, 30-Mar-2026, and similar variations all normalize to the same output.

7. Boolean Normalization

Maps 22+ common boolean representations to standardized true or false values. Recognized truthy values include: yes, ja, oui, si, 1, active, true, y, and more. Their negative counterparts map to false. Custom boolean mappings are also supported for supplier-specific conventions.

8. Value Mapping

Applies value mappings to translate supplier-specific values to your approved vocabulary. Matching is case-insensitive and follows a priority order: exact match first, then case-insensitive match, then wildcard match. For example, a mapping of Midnight → black on the color attribute ensures that regardless of whether the supplier sends Midnight, midnight, or MIDNIGHT, it normalizes to black.

9. Transform Rules

Applies custom transform rules after normalization. FeedPrep supports 12 operation types including mathematical operations, text manipulation, conditional logic, and concatenation. Transform rules are applied in the order they are defined and can reference values from other columns, enabling complex derived calculations.

10. Default Values

Fills in missing or empty fields with configured default values. This runs after all other transformations, so it only kicks in when a value is still null or empty after the preceding steps have had their chance to populate it.

11. Multiselect Parsing

For multiselect attributes, this step splits a single cell value into multiple discrete values using a separator. FeedPrep auto-detects the separator, supporting pipe (|), semicolon (;), and comma (,). For example, red|blue|green is parsed into three separate values: red, blue, and green.

12. Approved Value Validation

Validates values for select and multiselect attributes against the list of approved options defined in your schema. Any value that does not match an approved option is nullified — replaced with null rather than passed through. This prevents unapproved values from leaking into your exports and downstream systems.

13. Case Normalization

The final step converts text values to Title Case for visual consistency. This ensures that values like solid oak, SOLID OAK, and Solid oak all render as Solid Oak in your normalized output.

Key Concept: Non-Destructive Normalization

An important property of FeedPrep's pipeline is that normalization happens live during review and export. Your raw data is always preserved exactly as it was imported. The pipeline runs on-the-fly when you view normalized data or generate an export, which means:

You can always see the original raw values alongside the normalized output
Changing a mapping or rule retroactively updates all affected values
There is no risk of data loss from an incorrect normalization rule
You can experiment with different configurations without re-importing the feed

This "raw data in, normalized data out" architecture ensures your source data remains a reliable reference at all times.

What's Next?

Now that you understand how the pipeline works, head back to the Help Center to explore guides on specific topics like value mapping, transform rules, and export configuration.