Convert File Encoding with iconv
Table of Contents
Some systems still export files in legacy character encodings. My bank, for instance, exports transaction CSVs in CP1250 (a Central European Windows encoding), which causes garbled characters when I try to import them into my UTF-8 based tooling.
The Command #
iconv -f CP1250 -t UTF-8 < input.csv > output.csv
That’s it. The file is now UTF-8 encoded and ready for processing.
Breaking It Down #
-f CP1250- the source encoding (what the file currently is)-t UTF-8- the target encoding (what you want)< input.csv- read from the original file> output.csv- write to a new file
Common Encodings You Might Encounter #
| Encoding | Where You’ll See It |
|---|---|
| CP1250 | Central European Windows (Polish, Czech, Hungarian) |
| CP1252 | Western European Windows |
| ISO-8859-1 | Latin-1, older Western European systems |
| ISO-8859-2 | Latin-2, Central European |
| UTF-16 | Windows Unicode files, some APIs |
Useful Variations #
List all available encodings:
iconv -l
Detect the encoding of a file (requires file command):
file -i input.csv
Convert in place using a temp file:
iconv -f CP1250 -t UTF-8 input.csv > input_utf8.csv && mv input_utf8.csv input.csv
Why This Happens #
UTF-8 has become the universal standard, but legacy systems - especially in banking, government, and enterprise software - often still use regional encodings from the Windows era. CP1250 was the default for Central European Windows installations, so many Polish and Czech systems still export in this format.
When you see garbled characters like Ĺ› instead of ś or Ă³ instead of ó, you’re likely looking at a CP1250 file being interpreted as UTF-8.