On the fly generation of zip files

2014-06-12

Introduction

Let's consider the following situation: You have built a file repository system or a photo album in CakePHP and you would like the user to download all the files or photos in a specific folder in one convenient zip file. In principle, you have several options how to proceed:

  • Pre-generate the zip file and provide a link to that file
  • Generate the zip file on request and give the link to the user
  • Generate the zip file on the fly

Here, I will discuss why the latter option is most likely the best and how to implement this in CakePHP.

Without any doubt, the first two options require both computational time and disk space. If you pre-generate the zip file, this means that for each file or photo you have added, some time is spent adding the item to the zip file. Consequently, uploading of new content will take more time. Moreover, you will need almost 1.5x the normal amount of storage space, as you need to store both the raw images as well as the zip file.

Only generating the zip file when it is being requested by a user would alleviate this somewhat, as you only need storage space when the user places a request. You could for instance use a temporary file system for this. The problem with this approach is that the user is required to wait an additional amount of time for the zip file to be created first before it can be offered for download.

Generating a zip file on the fly has in this sense many benefits. First of all, because the zip file is generated on the fly, the download starts immediately and the user does not have to wait. Secondly, because the file is immediately sent to the user, it does not need to be saved on the disk. Most likely, the zip compression algorithm will be fast enough that no significant reduction in download speed is noted, although this depends of course on the technical specifications of your system. Indeed, the only major drawback of this method is that the processor has to work harder during the download and multiple simultaneous downloads could give very high load averages.

Implementation

We assume for the sake of simplicity that you have a particular method in your controller which gives an array of the locations of all the files which need to be sent to the user. In the example below, we have a Folder Model with a hasMany association to a File class. We have chosen these names purely for this example, for a real application I would not recommend using a class named 'File' as there already exists such a class in the CakePHP library!!


public function zip($id = null) {
    $this->layout = 'empty';
    $this->Folder->recursive = 1;
    $this->set('folder', $this->Folder->read(null, $id));
}

Note that we have set an empty file for the layout. (i.e. no layout)

Our view file looks like the following:


<?php

header('Content-Type: application/zip');
header('Content-disposition: attachment; filename="'.$folder['Folder']['name'].'.zip"');
header('Content-Transfer-Encoding: binary');

mb_http_output('pass');

ob_clean();
flush();

$files = '';
foreach($folder['File'] as $file) {
    $files .= '"'.$folder['path'].'" ';
}

$fp = popen('zip -r - '.$files, 'r');

$bufsize = 8192;
$buff = '';

while( !feof($fp) ) {
    $buff = fread($fp, $bufsize);
    ob_flush();
    echo $buff;
}

pclose($fp);

?>

Let us go through the code. The first three lines inform the browser that we are going to send a zip file with binary encoding. Moreover, we can already give the file a name which we have conveniently extracted from our Folder object. The mb_http_output('pass'); line ensures us that PHP is not going to mess with the encoding. ob_clean() and flush() flush the output and write buffers.

Next, we convert the array of file locations to a string. We use this string to tell the zip program (make sure you have it installed on your operating system) which files to compress. The output of the zip program is not a zip file, but a stream, which is conveniently piped to PHP using the popen function and referenced by a resource handler.

Now comes the interesting part, via this resource handler a chunk of data is read (as set in $bufsize, which is 8kb) to the read buffer, the read buffer is then flushed and the output is echoed. In other words, in segments of 8kb the download is being 'fed'. This is a continuous loop until an end-of-file (EOF) is encountered. After that, the pipe is closed. This way, we only need a very small amount of memory because we fill up a chunk of 8kb, parse it to the browser and then release the memory. If we would let the zip program stream its output completely to the memory and then echo this, we would most likely encounter a memory allocation error.

If you have questions or comments, feel free to drop a line! Like what you read? Share this page with your friends and colleagues.

Comments

Question:
What is the answer to Nine + Three?
Please answer with a whole number, i.e. 2, 3, 5, 8,...