HTML stripping and whitespace cleanup are the first two steps in FeedPrep's normalization pipeline. They run before any column mapping, value mapping, or transformation — ensuring that downstream steps work with clean, consistent text.

Step 1: Whitespace Cleanup

Supplier feeds frequently contain messy whitespace: extra spaces, leading or trailing blanks, and fields that look like they have a value but are actually just empty space. Whitespace cleanup handles all of this automatically.

What it does:

Raw ValueAfter Cleanup
  Oslo Dining Chair  Oslo Dining Chair
Black    LeatherBlack Leather
   null
(empty string)null

Step 2: HTML Stripping

Many supplier feeds contain HTML markup embedded in field values — especially in product descriptions, but sometimes in unexpected places like titles or specifications. HTML stripping removes all HTML tags from field values, leaving only the plain text content.

Examples:

Raw ValueAfter Stripping
<p>Solid oak <strong>dining</strong> chair</p>Solid oak dining chair
Color: <span style="color:red">Red</span>Color: Red
<br/>45 cm<br/>45 cm

Per-Column "Keep HTML" Option

Not all HTML should be stripped. Product descriptions often contain intentional formatting — bullet lists, bold text, paragraph breaks — that should be preserved for display on your storefront or marketplace listing.

FeedPrep's adapter includes a keep_html setting that lets you specify which fields should retain their HTML content. When a field is marked with keep_html, the HTML stripping step skips that field entirely.

Common use cases for keeping HTML:

Fields not marked with keep_html will always have their HTML stripped. This default-strip approach ensures that stray HTML tags don't contaminate structured data fields like color, material, or dimensions.

Pipeline Position

Whitespace cleanup and HTML stripping are deliberately the first steps in the normalization pipeline:

  1. Whitespace cleanup
  2. HTML stripping
  3. Column mapping
  4. Value mapping
  5. Unit conversion
  6. Transform rules
  7. Validation

By running first, they ensure that every subsequent step — column mapping, value matching, unit extraction — works with clean text. A value like   <b>Black</b>   becomes Black before the value mapper ever sees it, so your mapping rules only need to handle the actual content, not formatting artifacts.

Clean Data From the First Step

FeedPrep automatically strips HTML and normalizes whitespace so your data is consistent before any mapping begins.

Start Free Trial