Say you have a csv file that you want to parse with PHP. No problem, PHP has a function for everything! Assuming that maybe we’re on the beginner-to-medium level of experience, we might be tempted to reach for
str_getcsv as a quick way to read CSV files.
Before we get started, I’m using PHP 7.3.14 on Ubuntu 18.04.3 LTS (Bionic Beaver).
Let’s start with a simple CSV file:
name,age,phone,city Bob,44,5555555,Boston Jane,36,5555556,Philadelphia Paco,54,5555557,Denver
Now let’s parse that file! For simplicity we’ll just print the data out with
var_export() to examine what’s happening.
<?php $data = file_get_contents('people.csv'); var_export(str_getcsv($data));
What do we get?
$ php readcsv.php array ( 0 => 'name', 1 => 'age', 2 => 'phone', 3 => 'city Bob', 4 => '44', 5 => '5555555', 6 => 'Boston Jane', 7 => '36', 8 => '5555556', 9 => 'Philadelphia Paco', 10 => '54', 11 => '5555557', 12 => 'Denver',
That’s definitely not correct, CSV is 2-dimensional, but I received a single array. And worse: the fields at the end of each row get merged with the beginning of the next row!
(Note that I chose
var_export() here to make it as clear as possible.
print_r() doesn’t show the issue clearly enough, and
var_dump() adds extra info that actually makes it harder to understand.)
It’s pretty obvious what went wrong here. The function ignored the linebreaks and read all of the data as a single row.
Let’s check the documentation and see if there’s a parameter that could help us:
str_getcsv ( string $input [, string $delimiter = "," [, string $enclosure = '"' [, string $escape = "\\" ]]] ) : array
Well, no. We can provide one delimeter, which is for field separators, an enclosure (usually double-quotes), that allows us to wrap fields that may have commas in them so that they won’t be seen as field separators, and then an escapt character which allows us to use the enclosure character in the enclosure.
So no, there’s no parameter for line separators in the function designed to parse CSV, it can only parse one row.
Yes, that’s correct.
PHP’s function for parsing CSV only parses one row of CSV.
Maybe all is not lost, the official documentations “User Contribute Notes” are full of helpful suggestions.
Let’s try the most upvoted suggestion:
$ cat readcsv.php <?php $csv = array_map('str_getcsv', file('data.csv')); var_export($csv);
$ php readcsv.php array ( 0 => array ( 0 => 'name', 1 => 'age', 2 => 'phone', 3 => 'city', ), 1 => array ( 0 => 'Bob', 1 => '44', 2 => '5555555', 3 => 'Boston', ), 2 => array ( 0 => 'Jane', 1 => '36', 2 => '5555556', 3 => 'Philadelphia', ), 3 => array ( 0 => 'Paco', 1 => '54', 2 => '5555557', 3 => 'Denver', ), )
Hey, that worked!
This worked because
file() returns an array of lines, which are then fed through
array_map() to be parsed individually with
But what if there’s a line-break within a field?
Let’s tweak our data to find out (yes, a newline in a phone number is contrived, it doesn’t matter; the principle doesn’t change):
name,age,phone,city Bob,44,"555 5555",Boston Jane,36,5555556,Philadelphia Paco,54,5555557,Denver
$ php readcsv.php array ( 0 => array ( 0 => 'name', 1 => 'age', 2 => 'phone', 3 => 'city', ), 1 => array ( 0 => 'Bob', 1 => '44', 2 => '555 ', ), 2 => array ( 0 => '5555"', 1 => 'Boston', ), 3 => array ( 0 => 'Jane', 1 => '36', 2 => '5555556', 3 => 'Philadelphia', ), 4 => array ( 0 => 'Paco', 1 => '54', 2 => '5555557', 3 => 'Denver', ),
Since we’re splitting on lines before parsing,
str_getcsv() is already receiving bad data so we can’t hope to fix it.
To be fair, there’s actually an editor’s note on this pointing out that it doesn’t work.
Let’s try some others.
There’s a clever approach in a couple of stackoverflow posts as well as on the PHP page for
str_getcsv() that simply uses
str_getcsv() to parse the rows. I can see how this seems promising, as the function is supposed to parse CSV, so it maybe will handle rows correctly. Let’s try it out:
<?php $CsvString = file_get_contents('data.csv'); $Data = str_getcsv($CsvString, "\n"); foreach($Data as &$Row) $Row = str_getcsv($Row, ","); var_export($Data);
Nope. Same exact result.
One of my colleages proposed splitting the entire csv at once, and then just read
n columns for each “row”. This will not work either, because row separators will be ignored and the end of each row will be combined with the beginning of the next row.
It may be possible to reconstruct the rows by combining rows whenever you read fewer than the correct number of columns. I didn’t experiment with this, and I’m not sure whether it would work as expected, but most likely it will be both slower and more complex than using
fgetcsv() or writing your own correct parser.
What if I don’t care about the cases it doesn’t work?
Sure, if you don’t need multiline fields in your CSVs then most of these options will work. My experience however, is that in the real world, data is often not always exactly what you expect. While you may start out with the vendor guaranteeing the file will never have newlines, what if they introduce a bug or add a new field with newlines, or someone makes a mistake in data entry?
Maybe you just return an error in those cases? That’s part of the problem, depending on how you fail to read the data, you may end up with data that is wrong but “works”, or works sporadically in the middle of the night or your day off.
Don’t take that chance.
str_getcsv() is broken. Don’t use it.