You just got a fresh CSV from a supplier. You open it in your PIM's import tool, hit upload, and… 47 errors. Half the rows failed validation. Some fields landed in the wrong columns. The prices look like phone numbers.
If this sounds familiar, you're not alone. Supplier CSVs are almost never import-ready. This guide gives you a concrete, step-by-step checklist for cleaning them up before they go anywhere near Akeneo, Pimcore, Salsify, or any other PIM system.
Why Supplier CSVs Are Never Import-Ready
Suppliers don't think about your PIM. They export data from their own systems — ERPs, spreadsheets, custom databases — and what you receive is shaped by their tools, not yours. Here's what typically goes wrong:
- Encoding mismatches. They saved it in Latin-1 or Windows-1252, your PIM expects UTF-8. Special characters turn into garbled sequences.
- Wrong delimiters. European suppliers often use semicolons. Some systems export tabs. Your PIM expects commas.
- Merged cells exported badly. What looked fine in Excel becomes empty rows and shifted columns in CSV.
- Inconsistent quoting. Some fields are quoted, some aren't. Fields containing the delimiter character break the column structure entirely.
- Junk rows. Company branding in row 1, subtotals at the bottom, blank rows in between.
None of this is malicious. It's just what happens when data crosses system boundaries. The fix is a consistent cleanup process, applied before every import.
The 10-Step CSV Cleanup Checklist
Work through these steps in order. Each one builds on the previous — there's no point mapping columns if half the rows are corrupted by an encoding issue.
Step 1: Check File Encoding
The most common silent destroyer of product data. If the encoding is wrong, everything downstream can break — special characters, accented letters, currency symbols.
What to look for:
üinstead ofü(UTF-8 bytes read as Latin-1)–instead of an em dash- A
character at the very start of the file (the UTF-8 BOM — invisible in most editors but breaks header detection)
Before:
name;description
Stühle Modell Zürich;Massivholz, Öko-zertifiziert
Opened as Latin-1, displayed as UTF-8:
name;description
Stühle Modell Zürich;Massivholz, Ãko-zertifiziert
After (correct UTF-8):
name;description
Stühle Modell Zürich;Massivholz, Öko-zertifiziert
How to fix: Open the file in a text editor that shows encoding (VS Code, Notepad++, Sublime Text). Convert to UTF-8 without BOM. On the command line: file -i supplier.csv tells you the current encoding, and iconv -f WINDOWS-1252 -t UTF-8 supplier.csv > supplier-utf8.csv converts it.
Step 2: Verify Delimiter and Quoting
A CSV is only a CSV if you know what separates the columns. Don't assume commas.
Before:
sku;name;description;price
A100;"Garden Chair, foldable";"Lightweight, UV-resistant";49,90
Imported with comma delimiter — columns split incorrectly:
| Column 1 | Column 2 | Column 3 | Column 4 |
|---|---|---|---|
| sku;name;description;price | |||
| A100;"Garden Chair | foldable";"Lightweight | UV-resistant";49 | 90 |
After (correct semicolon delimiter):
| sku | name | description | price |
|---|---|---|---|
| A100 | Garden Chair, foldable | Lightweight, UV-resistant | 49,90 |
How to fix: Open the raw file in a plain text editor (not Excel, which auto-parses). Look at the first few rows. The delimiter is usually obvious. If you're scripting, most CSV libraries let you specify the delimiter and quote character explicitly.
Step 3: Remove Header and Footer Junk Rows
Supplier exports often include rows that aren't data: a company name in row 1, an export timestamp in row 2, column headers in row 3, and a "Total: 847 products" row at the bottom.
Before:
ACME Industrial Supply Co. - Product Export
Generated: 2026-03-15 09:42:00
sku,name,price,stock
W-001,Widget A,12.50,200
W-002,Widget B,8.75,340
...
TOTAL,,21.25,540
"End of report"
After:
sku,name,price,stock
W-001,Widget A,12.50,200
W-002,Widget B,8.75,340
How to fix: Identify which row contains the actual column headers. Delete everything above it and everything below the last data row. In scripting terms: skip the first N rows, and stop reading when you hit a row where the SKU column is empty or contains a keyword like "total."
Step 4: Handle Empty vs. Missing Values
This one's subtle but causes real PIM import problems. An empty string, the text N/A, the number 0, and a truly absent value all mean different things — but suppliers mix them freely.
Before:
| sku | weight_kg | color | lead_time_days |
|---|---|---|---|
| A100 | 2.5 | red | 14 |
| A101 | 0 | N/A | |
| A102 | - | 0 | |
| A103 | n/a | TBD | NULL |
After:
| sku | weight_kg | color | lead_time_days |
|---|---|---|---|
| A100 | 2.5 | red | 14 |
| A101 | 0 | (empty) | (empty) |
| A102 | (empty) | (empty) | 0 |
| A103 | (empty) | (empty) | (empty) |
How to fix: Decide on a convention: empty string means "no value," and 0 means an actual zero. Then replace all placeholder values (N/A, n/a, -, NULL, TBD, none) with empty strings. Be careful with 0 — a weight of 0 is probably missing data, but a lead time of 0 might mean "ships immediately."
Step 5: Standardize Decimal Separators
A European supplier writes 1.299,50. Your PIM expects 1299.50. If you don't catch this, a price of €1,299.50 becomes €1.30 — or the import just fails.
Before:
| sku | price | weight_kg | width_cm |
|---|---|---|---|
| A100 | 1.299,00 | 2,5 | 45,7 |
| A101 | 849,90 | 1,2 | 30,0 |
After:
| sku | price | weight_kg | width_cm |
|---|---|---|---|
| A100 | 1299.00 | 2.5 | 45.7 |
| A101 | 849.90 | 1.2 | 30.0 |
How to fix: For European-format numbers: remove the thousands separator (period), then replace the decimal separator (comma) with a period. The order matters — do it the other way and you'll corrupt the data. Identify which format you're dealing with before applying any transformation.
Step 6: Fix Date Formats
Is 03/04/2026 March 4th or April 3rd? You genuinely cannot tell without knowing the supplier's locale. And your PIM almost certainly wants ISO 8601 (YYYY-MM-DD).
Before:
| Supplier | Raw Date | Format |
|---|---|---|
| Supplier A (US) | 03/04/2026 | MM/DD/YYYY |
| Supplier B (DE) | 04.03.2026 | DD.MM.YYYY |
| Supplier C (UK) | 04/03/2026 | DD/MM/YYYY |
| Supplier D (JP) | 2026-03-04 | YYYY-MM-DD |
All four represent the same date: March 4, 2026.
After:
2026-03-04
How to fix: Ask your supplier what date format they use — once. Document it. Then parse and reformat to ISO 8601 on every import. Never try to auto-detect date formats; the ambiguity between DD/MM and MM/DD is unresolvable from the data alone.
Step 7: Normalize Text Casing and Whitespace
Leading spaces, trailing spaces, double spaces, inconsistent capitalization. These create duplicate values in your PIM's filters and make search unreliable.
Before:
| sku | color | category |
|---|---|---|
| A100 | Red | Garden Furniture |
| A101 | red | garden furniture |
| A102 | RED | GARDEN FURNITURE |
| A103 | red | Garden furniture |
After:
| sku | color | category |
|---|---|---|
| A100 | red | Garden Furniture |
| A101 | red | Garden Furniture |
| A102 | red | Garden Furniture |
| A103 | red | Garden Furniture |
How to fix: Trim whitespace from all fields. Collapse multiple spaces into one. For attribute values used in filters (color, material, category), pick a casing convention — typically lowercase for codes, Title Case for display names — and apply it consistently.
Step 8: Validate Numeric Fields
Prices with currency symbols. Dimensions with unit suffixes. Percentages with the percent sign included. These are all strings pretending to be numbers, and your PIM's numeric fields will reject them.
Before:
| sku | price | width | discount |
|---|---|---|---|
| A100 | €129.90 | 45 cm | 15% |
| A101 | EUR 84.50 | 30.0cm | 10 % |
| A102 | $99.00 | 22 inches | 20percent |
After:
| sku | price | width_cm | discount_pct |
|---|---|---|---|
| A100 | 129.90 | 45.0 | 15 |
| A101 | 84.50 | 30.0 | 10 |
| A102 | 99.00 | 55.9 | 20 |
How to fix: Strip currency symbols, unit suffixes, and percentage signs. If the column includes mixed units (cm and inches in the same column), you need to detect and convert, not just strip. Store the unit in the column name or a separate column, not in the value itself.
Step 9: Check for Duplicate SKUs or Merged Product Rows
Duplicate SKUs cause update conflicts. Merged rows — where a product with multiple variants was represented as one row in the supplier's system but needs to be multiple rows in yours — cause data loss.
Before:
| sku | name | color | price |
|---|---|---|---|
| A100 | Garden Chair | red | 49.90 |
| A100 | Garden Chair | blue | 49.90 |
| A101 | Side Table | red/blue/green | 79.90 |
Row 1 and 2: duplicate SKU — same product, different color variants sharing one SKU. Row 3: multiple values crammed into one field.
After:
| sku | name | color | price |
|---|---|---|---|
| A100-RED | Garden Chair | red | 49.90 |
| A100-BLU | Garden Chair | blue | 49.90 |
| A101-RED | Side Table | red | 79.90 |
| A101-BLU | Side Table | blue | 79.90 |
| A101-GRN | Side Table | green | 79.90 |
How to fix: Run a duplicate check on your SKU column. For true duplicates (same SKU, different data), decide whether they're variants that need unique SKUs or accidental duplicates where one should be removed. For merged values (multiple colors in one field), split them into separate rows with distinct identifiers.
Step 10: Map Columns to Your PIM Schema
The final step: the supplier's column names almost certainly don't match your PIM's attribute codes. This is where you create the translation layer.
Before:
| Supplier Column | PIM Attribute |
|---|---|
| Art.Nr. | sku |
| Produktname | name |
| Farbe | color |
| VK Preis (EUR) | price |
| Breite (cm) | width |
| Kat. | category |
| Lieferzeit Tage | lead_time |
| (not provided) | brand |
How to fix: Create a column mapping document for each supplier. Map every supplier column to a PIM attribute. Identify PIM attributes that the supplier doesn't provide — you'll need to fill those from another source or set defaults. Drop supplier columns that you don't need (internal supplier fields, redundant data).
When to Automate This
If you import one CSV from one supplier once a quarter, a manual process works fine. Open the file, work through the checklist, clean it up, import it. Thirty minutes of work, done.
But the math changes fast:
- 5 suppliers, monthly updates = 60 cleanup sessions per year
- 10 suppliers, weekly updates = 520 cleanup sessions per year
- Each session takes 20-45 minutes depending on file quality
- Every session carries human error risk — a wrong decimal conversion, a missed duplicate, a column mapping done slightly differently than last time
The tipping point is usually around 3-5 suppliers with regular updates. At that point, the time spent on repetitive cleanup exceeds the time it would take to set up rules once and apply them automatically going forward.
More importantly, manual cleanup doesn't accumulate knowledge. The person who cleaned last month's file knows the quirks of Supplier B's CSV format — but that knowledge lives in their head, not in a system. When they're on vacation, someone else starts from scratch.
How FeedPrep Handles This
FeedPrep's Supplier Adapters let you configure steps 1 through 10 as rules for each supplier. The first time you import a supplier's file, you walk through the mapping — setting the encoding, delimiter, column mappings, value transformations, and validation rules. FeedPrep learns those rules and applies them to every future file from that supplier.
It's not magic and it's not instant. The first file still requires your input: you know your PIM schema, you know what "Farbe" means, you know that this particular supplier uses commas as decimal separators. But the second file, and the twentieth, and the hundredth — those flow through the same rules without manual intervention.
When something new shows up — a color value that isn't in your mapping, a column the supplier added — FeedPrep flags it in your Inbox for review instead of silently passing it through or silently dropping it. You make the decision once, and the rule updates for next time.
That's the difference between a process that resets every month and one that gets better over time.