Encoding when exporting in CSV

Olivier_Desloges_Boi · 13 October 2022 19:43

Hi !

I’m trying to run a script to format the CSV file into a good looking Excel via python.

Sometimes when exporting, it exports the file with a UTF-8 encoding and sometimes in a ANSI encoding. That mess with my code.

Does anyone have a solution to make it that it always exports in a UTF-8 encoding?

Thank you!

Frederic_Malenfant · 13 October 2022 21:02

Bonjour Olivier, bienvenue sur le forum!

We are missing some informations in your question…

The UTF-8 encoding can be lost in several places.

When reading the response from the REST api
When converting your file to CSV
When transferring it to excel.
etc.

Do you save a text file, in CSV, that you then open in excel? Is your CSV utf-8 encoded, prefixed with BOM?

We resolved almost all UTF-8 issues when using accents in Québec, by making sure all our source files are BOM-prefixed (UTF-8). I hope you can do that using python, I don’t know.

Without more informations about all the conversions that can be done, between the “rest” call, and the excel file, we can’t help much.

So, there’s lot of places you should look at.

Frédéric.

Olivier_Desloges_Boi · 13 October 2022 21:57

What I typically do is export the project via the export tool and save it (the file is a CSV when downloaded and I keep it that way). I then check the encoding using Notepad ++. At that point, nearly all the projects that I export are UTf-8 encoded but some are ANSI.

The problem with that is not about the accents or some characters not being the right one (like an apostrophe being displayed in Excel as Â), it is that some tasks have notes. In that case, the notes are typically written on more than one line in the CSV file.

The Python script serves to analyse every tasks separately.

To do that, I use csv.reader and specify an encoding (“try” statement with UTF-8 and some lines of code that use indexes which will raise an IndexError if the tasks is not fully captured in the line and “except” statement with another encoding). When the file are encoded with UTF-8, the notes, even though they may be on multiple lines, are still considered a single one and they can be treated with a “for line in reader” statement. The problem arises when the encoding is ANSI which will treat each new line of the CSV file as a new task (since each line is supposed to be a task).

Thank you ! I hope that helps

Frederic_Malenfant · 14 October 2022 12:06

This trouble have nothing to do with the API, we may need to move that discussion on another topic…

Some people on that forum can do that I think.

But I’ll try to help, because here in french we had the same trouble several times!

When you look the encoding of your file in Notepad++, if it is not UTF-8 BOM, but just UTF-8, it tries to guess the encoding based on the content.

Maybe your python parser do the same…

Before converting your exported file, you can try, in notepad++, to convert it to UTF-8 BOM. This adds 3 special bytes at the beginning of the file, that will tell your reader that this file is 100% utf-8, to not allow your parser to guess the encoding based on the content.

Topic		Replies	Views
Problemas al Descargar la Plantilla para Importar CSV Ayuda & Consejos projects	2	2379	13 July 2020
Export CSV Ask the Community	5	2874	2 June 2020
CSV出力の文字コードを合わせてほしい。製品フィードバック	9	1201	26 June 2023
Some special characters are not supported by CSV Importer Closed tasks	9	2143	11 April 2023
Csv export is not working on excel Closed import-export	4	2765	11 August 2021

Encoding when exporting in CSV

Related topics