Skip to main content

Convert File Encoding with iconv

·2 mins

Some systems still export files in legacy character encodings. My bank, for instance, exports transaction CSVs in CP1250 (a Central European Windows encoding), which causes garbled characters when I try to import them into my UTF-8 based tooling.

The Command #

iconv -f CP1250 -t UTF-8 < input.csv > output.csv

That’s it. The file is now UTF-8 encoded and ready for processing.

Breaking It Down #

  • -f CP1250 - the source encoding (what the file currently is)
  • -t UTF-8 - the target encoding (what you want)
  • < input.csv - read from the original file
  • > output.csv - write to a new file

Common Encodings You Might Encounter #

EncodingWhere You’ll See It
CP1250Central European Windows (Polish, Czech, Hungarian)
CP1252Western European Windows
ISO-8859-1Latin-1, older Western European systems
ISO-8859-2Latin-2, Central European
UTF-16Windows Unicode files, some APIs

Useful Variations #

List all available encodings:

iconv -l

Detect the encoding of a file (requires file command):

file -i input.csv

Convert in place using a temp file:

iconv -f CP1250 -t UTF-8 input.csv > input_utf8.csv && mv input_utf8.csv input.csv

Why This Happens #

UTF-8 has become the universal standard, but legacy systems - especially in banking, government, and enterprise software - often still use regional encodings from the Windows era. CP1250 was the default for Central European Windows installations, so many Polish and Czech systems still export in this format.

When you see garbled characters like Ĺ› instead of ś or Ă³ instead of ó, you’re likely looking at a CP1250 file being interpreted as UTF-8.