How to Clean a Supplier CSV Before Importing Into Your PIM

You just got a fresh CSV from a supplier. You open it in your PIM's import tool, hit upload, and… 47 errors. Half the rows failed validation. Some fields landed in the wrong columns. The prices look like phone numbers.

If this sounds familiar, you're not alone. Supplier CSVs are almost never import-ready. This guide gives you a concrete, step-by-step checklist for cleaning them up before they go anywhere near Akeneo, Pimcore, Salsify, or any other PIM system.

Why Supplier CSVs Are Never Import-Ready

Suppliers don't think about your PIM. They export data from their own systems — ERPs, spreadsheets, custom databases — and what you receive is shaped by their tools, not yours. Here's what typically goes wrong:

Encoding mismatches. They saved it in Latin-1 or Windows-1252, your PIM expects UTF-8. Special characters turn into garbled sequences.
Wrong delimiters. European suppliers often use semicolons. Some systems export tabs. Your PIM expects commas.
Merged cells exported badly. What looked fine in Excel becomes empty rows and shifted columns in CSV.
Inconsistent quoting. Some fields are quoted, some aren't. Fields containing the delimiter character break the column structure entirely.
Junk rows. Company branding in row 1, subtotals at the bottom, blank rows in between.

None of this is malicious. It's just what happens when data crosses system boundaries. The fix is a consistent cleanup process, applied before every import.

The 10-Step CSV Cleanup Checklist

Work through these steps in order. Each one builds on the previous — there's no point mapping columns if half the rows are corrupted by an encoding issue.

Step 1: Check File Encoding

The most common silent destroyer of product data. If the encoding is wrong, everything downstream can break — special characters, accented letters, currency symbols.

What to look for:

Ã¼ instead of ü (UTF-8 bytes read as Latin-1)
â€“ instead of an em dash
A character at the very start of the file (the UTF-8 BOM — invisible in most editors but breaks header detection)

Before:

name;description
Stühle Modell Zürich;Massivholz, Öko-zertifiziert

Opened as Latin-1, displayed as UTF-8:

name;description
StÃ¼hle Modell ZÃ¼rich;Massivholz, Ãko-zertifiziert

After (correct UTF-8):

name;description
Stühle Modell Zürich;Massivholz, Öko-zertifiziert

How to fix: Open the file in a text editor that shows encoding (VS Code, Notepad++, Sublime Text). Convert to UTF-8 without BOM. On the command line: file -i supplier.csv tells you the current encoding, and iconv -f WINDOWS-1252 -t UTF-8 supplier.csv > supplier-utf8.csv converts it.

Step 2: Verify Delimiter and Quoting

A CSV is only a CSV if you know what separates the columns. Don't assume commas.

Before:

sku;name;description;price
A100;"Garden Chair, foldable";"Lightweight, UV-resistant";49,90

Imported with comma delimiter — columns split incorrectly:

Column 1	Column 2	Column 3	Column 4
sku;name;description;price
A100;"Garden Chair	foldable";"Lightweight	UV-resistant";49	90

After (correct semicolon delimiter):

sku	name	description	price
A100	Garden Chair, foldable	Lightweight, UV-resistant	49,90

How to fix: Open the raw file in a plain text editor (not Excel, which auto-parses). Look at the first few rows. The delimiter is usually obvious. If you're scripting, most CSV libraries let you specify the delimiter and quote character explicitly.

Step 3: Remove Header and Footer Junk Rows

Supplier exports often include rows that aren't data: a company name in row 1, an export timestamp in row 2, column headers in row 3, and a "Total: 847 products" row at the bottom.

Before:

ACME Industrial Supply Co. - Product Export
Generated: 2026-03-15 09:42:00
sku,name,price,stock
W-001,Widget A,12.50,200
W-002,Widget B,8.75,340
...
TOTAL,,21.25,540
"End of report"

After:

sku,name,price,stock
W-001,Widget A,12.50,200
W-002,Widget B,8.75,340

How to fix: Identify which row contains the actual column headers. Delete everything above it and everything below the last data row. In scripting terms: skip the first N rows, and stop reading when you hit a row where the SKU column is empty or contains a keyword like "total."

Step 4: Handle Empty vs. Missing Values

This one's subtle but causes real PIM import problems. An empty string, the text N/A, the number 0, and a truly absent value all mean different things — but suppliers mix them freely.

Before:

sku	weight_kg	color	lead_time_days
A100	2.5	red	14
A101	0	N/A
A102		-	0
A103	n/a	TBD	NULL

After:

sku	weight_kg	color	lead_time_days
A100	2.5	red	14
A101	0	(empty)	(empty)
A102	(empty)	(empty)	0
A103	(empty)	(empty)	(empty)

How to fix: Decide on a convention: empty string means "no value," and 0 means an actual zero. Then replace all placeholder values (N/A, n/a, -, NULL, TBD, none) with empty strings. Be careful with 0 — a weight of 0 is probably missing data, but a lead time of 0 might mean "ships immediately."

Step 5: Standardize Decimal Separators

A European supplier writes 1.299,50. Your PIM expects 1299.50. If you don't catch this, a price of €1,299.50 becomes €1.30 — or the import just fails.

Before:

sku	price	weight_kg	width_cm
A100	1.299,00	2,5	45,7
A101	849,90	1,2	30,0

After:

sku	price	weight_kg	width_cm
A100	1299.00	2.5	45.7
A101	849.90	1.2	30.0

How to fix: For European-format numbers: remove the thousands separator (period), then replace the decimal separator (comma) with a period. The order matters — do it the other way and you'll corrupt the data. Identify which format you're dealing with before applying any transformation.

Step 6: Fix Date Formats

Is 03/04/2026 March 4th or April 3rd? You genuinely cannot tell without knowing the supplier's locale. And your PIM almost certainly wants ISO 8601 (YYYY-MM-DD).

Before:

Supplier	Raw Date	Format
Supplier A (US)	03/04/2026	MM/DD/YYYY
Supplier B (DE)	04.03.2026	DD.MM.YYYY
Supplier C (UK)	04/03/2026	DD/MM/YYYY
Supplier D (JP)	2026-03-04	YYYY-MM-DD

All four represent the same date: March 4, 2026.

After:

2026-03-04

How to fix: Ask your supplier what date format they use — once. Document it. Then parse and reformat to ISO 8601 on every import. Never try to auto-detect date formats; the ambiguity between DD/MM and MM/DD is unresolvable from the data alone.

Step 7: Normalize Text Casing and Whitespace

Leading spaces, trailing spaces, double spaces, inconsistent capitalization. These create duplicate values in your PIM's filters and make search unreliable.

Before:

sku	color	category
A100	Red	Garden Furniture
A101	red	garden furniture
A102	RED	GARDEN FURNITURE
A103	red	Garden furniture

After:

sku	color	category
A100	red	Garden Furniture
A101	red	Garden Furniture
A102	red	Garden Furniture
A103	red	Garden Furniture

How to fix: Trim whitespace from all fields. Collapse multiple spaces into one. For attribute values used in filters (color, material, category), pick a casing convention — typically lowercase for codes, Title Case for display names — and apply it consistently.

Step 8: Validate Numeric Fields

Prices with currency symbols. Dimensions with unit suffixes. Percentages with the percent sign included. These are all strings pretending to be numbers, and your PIM's numeric fields will reject them.

Before:

sku	price	width	discount
A100	€129.90	45 cm	15%
A101	EUR 84.50	30.0cm	10 %
A102	$99.00	22 inches	20percent

After:

sku	price	width_cm	discount_pct
A100	129.90	45.0	15
A101	84.50	30.0	10
A102	99.00	55.9	20

How to fix: Strip currency symbols, unit suffixes, and percentage signs. If the column includes mixed units (cm and inches in the same column), you need to detect and convert, not just strip. Store the unit in the column name or a separate column, not in the value itself.

Step 9: Check for Duplicate SKUs or Merged Product Rows

Duplicate SKUs cause update conflicts. Merged rows — where a product with multiple variants was represented as one row in the supplier's system but needs to be multiple rows in yours — cause data loss.

Before:

sku	name	color	price
A100	Garden Chair	red	49.90
A100	Garden Chair	blue	49.90
A101	Side Table	red/blue/green	79.90

Row 1 and 2: duplicate SKU — same product, different color variants sharing one SKU. Row 3: multiple values crammed into one field.

After:

sku	name	color	price
A100-RED	Garden Chair	red	49.90
A100-BLU	Garden Chair	blue	49.90
A101-RED	Side Table	red	79.90
A101-BLU	Side Table	blue	79.90
A101-GRN	Side Table	green	79.90

How to fix: Run a duplicate check on your SKU column. For true duplicates (same SKU, different data), decide whether they're variants that need unique SKUs or accidental duplicates where one should be removed. For merged values (multiple colors in one field), split them into separate rows with distinct identifiers.

Step 10: Map Columns to Your PIM Schema

The final step: the supplier's column names almost certainly don't match your PIM's attribute codes. This is where you create the translation layer.

Before:

Supplier Column	PIM Attribute
Art.Nr.	sku
Produktname	name
Farbe	color
VK Preis (EUR)	price
Breite (cm)	width
Kat.	category
Lieferzeit Tage	lead_time
(not provided)	brand

How to fix: Create a column mapping document for each supplier. Map every supplier column to a PIM attribute. Identify PIM attributes that the supplier doesn't provide — you'll need to fill those from another source or set defaults. Drop supplier columns that you don't need (internal supplier fields, redundant data).

When to Automate This

If you import one CSV from one supplier once a quarter, a manual process works fine. Open the file, work through the checklist, clean it up, import it. Thirty minutes of work, done.

But the math changes fast:

5 suppliers, monthly updates = 60 cleanup sessions per year
10 suppliers, weekly updates = 520 cleanup sessions per year
Each session takes 20-45 minutes depending on file quality
Every session carries human error risk — a wrong decimal conversion, a missed duplicate, a column mapping done slightly differently than last time

The tipping point is usually around 3-5 suppliers with regular updates. At that point, the time spent on repetitive cleanup exceeds the time it would take to set up rules once and apply them automatically going forward.

More importantly, manual cleanup doesn't accumulate knowledge. The person who cleaned last month's file knows the quirks of Supplier B's CSV format — but that knowledge lives in their head, not in a system. When they're on vacation, someone else starts from scratch.

How FeedPrep Handles This

FeedPrep's Supplier Adapters let you configure steps 1 through 10 as rules for each supplier. The first time you import a supplier's file, you walk through the mapping — setting the encoding, delimiter, column mappings, value transformations, and validation rules. FeedPrep learns those rules and applies them to every future file from that supplier.

It's not magic and it's not instant. The first file still requires your input: you know your PIM schema, you know what "Farbe" means, you know that this particular supplier uses commas as decimal separators. But the second file, and the twentieth, and the hundredth — those flow through the same rules without manual intervention.

When something new shows up — a color value that isn't in your mapping, a column the supplier added — FeedPrep flags it in your Inbox for review instead of silently passing it through or silently dropping it. You make the decision once, and the rule updates for next time.

That's the difference between a process that resets every month and one that gets better over time.

How to Clean a Supplier CSV Before Importing Into Your PIM

Why Supplier CSVs Are Never Import-Ready

The 10-Step CSV Cleanup Checklist

Step 1: Check File Encoding

Step 2: Verify Delimiter and Quoting

Step 3: Remove Header and Footer Junk Rows

Step 4: Handle Empty vs. Missing Values

Step 5: Standardize Decimal Separators

Step 6: Fix Date Formats

Step 7: Normalize Text Casing and Whitespace

Step 8: Validate Numeric Fields

Step 9: Check for Duplicate SKUs or Merged Product Rows

Step 10: Map Columns to Your PIM Schema

When to Automate This

How FeedPrep Handles This

Stop Cleaning the Same CSV Every Month