How can you parse excel CSV data that contains linebreaks in the data?
According to aleske, a commenter in the documentation for PHP's fgetcsv function:
The PHP's CSV handling stuff is non-standard and contradicts with RFC4180, thus fgetcsv() cannot properly deal with files [that contain line breaks] ...
And he offered up the following function to get around this limitation:
function csvstring_to_array(&$string, $CSV_SEPARATOR = ';', $CSV_ENCLOSURE = '"', $CSV_LINEBREAK = "\n") {
$o = array();
$cnt = strlen($string);
$esc = false;
$escesc = false;
$num = 0;
$i = 0;
while ($i < $cnt) {
$s = $string[$i];
if ($s == $CSV_LINEBREAK) {
if ($esc) {
$o[$num] .= $s;
} else {
$i++;
break;
}
} elseif ($s == $CSV_SEPARATOR) {
if ($esc) {
$o[$num] .= $s;
} else {
$num++;
$esc = false;
$escesc = false;
}
} elseif ($s == $CSV_ENCLOSURE) {
if ($escesc) {
$o[$num] .= $CSV_ENCLOSURE;
$escesc = false;
}
if ($esc) {
$esc = false;
$escesc = true;
} else {
$esc = true;
$escesc = false;
}
} else {
if ($escesc) {
$o[$num] .= $CSV_ENCLOSURE;
$escesc = false;
}
$o[$num] .= $s;
}
$i++;
}
// $string = substr($string, $i);
return $o;
}
That looks like it will do the trick.
I found that you can use a normal CSV parser after you convert the CSV to unix format.
Here is a function that did the trick for me .
function dos2unix($s) {
$s = str_replace("\r\n", "\n", $s);
$s = str_replace("\r", "\n", $s);
$s = preg_replace("/\n{2,}/", "\n\n", $s);
return $s;
}
And a parsing function
function csvstring_to_array($string, $separatorChar = ',', $enclosureChar = '"', $newlineChar = PHP_EOL) {
// @author: Klemen Nagode
$string = dos2unix($string);
$array = array();
$size = strlen($string);
$columnIndex = 0;
$rowIndex = 0;
$fieldValue="";
$isEnclosured = false;
for($i=0; $i<$size;$i++) {
$char = $string{$i};
$addChar = "";
if($isEnclosured) {
if($char==$enclosureChar) {
if($i+1<$size && $string{$i+1}==$enclosureChar){
// escaped char
$addChar=$char;
$i++; // dont check next char
}else{
$isEnclosured = false;
}
}else {
$addChar=$char;
}
}else {
if($char==$enclosureChar) {
$isEnclosured = true;
}else {
if($char==$separatorChar) {
$array[$rowIndex][$columnIndex] = $fieldValue;
$fieldValue="";
$columnIndex++;
}elseif($char==$newlineChar) {
echo $char;
$array[$rowIndex][$columnIndex] = $fieldValue;
$fieldValue="";
$columnIndex=0;
$rowIndex++;
}else {
$addChar=$char;
}
}
}
if($addChar!=""){
$fieldValue.=$addChar;
}
}
if($fieldValue) { // save last field
$array[$rowIndex][$columnIndex] = $fieldValue;
}
return $array;
}
The problem is that the "\n" escape string doesn't evaluate to the same new line character that that Excel uses for its row delimiter. The ASCII character that Excel uses is ASCII 13. The following code will efficiently parse a .csv file that is passed in via the $file_get_contents ()method.
<?php
//variable to store filename of file
$filename = $_SERVER['DOCUMENT_ROOT'] . "/site/docs/boothmap.csv";
//read file in as string
$file = file_get_contents($filename);
//convert csv to array
//first to single dimensional array
$array1D = explode(chr(13),$file);
//create new array to hold 2d array
$array2D = array();
//iterate through 1 dimensional array and explode each value to the new array
foreach($array1D as &$row)
{
array_push($array2D, explode(',',$row));
}
//pop off empty last row of array2D
array_pop($array2D);
//iterate through $array2D building table of data
//start table with column headers
echo "<table border=\"1\">\n<tr>\n<th>Company</th>\n<th>Booth #</th>\n<th>Location</th>\n</tr>\n";
foreach ($array2D as &$row)
{
echo "<tr>\n";
foreach($row as &$subrow)
{
echo "<td>" . $subrow . "</td>\n";
}
echo "</tr>\n";
}
//close table
echo "</table>";