Skip to main content

Command Palette

Search for a command to run...

File compression and decompression in C using ZLib

Updated
6 min read
J

I am Jyotiprakash, a deeply driven computer systems engineer, software developer, teacher, and philosopher. With a decade of professional experience, I have contributed to various cutting-edge software products in network security, mobile apps, and healthcare software at renowned companies like Oracle, Yahoo, and Epic. My academic journey has taken me to prestigious institutions such as the University of Wisconsin-Madison and BITS Pilani in India, where I consistently ranked among the top of my class.

At my core, I am a computer enthusiast with a profound interest in understanding the intricacies of computer programming. My skills are not limited to application programming in Java; I have also delved deeply into computer hardware, learning about various architectures, low-level assembly programming, Linux kernel implementation, and writing device drivers. The contributions of Linus Torvalds, Ken Thompson, and Dennis Ritchie—who revolutionized the computer industry—inspire me. I believe that real contributions to computer science are made by mastering all levels of abstraction and understanding systems inside out.

In addition to my professional pursuits, I am passionate about teaching and sharing knowledge. I have spent two years as a teaching assistant at UW Madison, where I taught complex concepts in operating systems, computer graphics, and data structures to both graduate and undergraduate students. Currently, I am an assistant professor at KIIT, Bhubaneswar, where I continue to teach computer science to undergraduate and graduate students. I am also working on writing a few free books on systems programming, as I believe in freely sharing knowledge to empower others.

To create a C program that uses Zlib to compress or decompress a file based on command-line arguments, you need to follow these steps:

  1. Install Zlib for Development on Ubuntu:

    • Open a terminal and run the following command to install Zlib development libraries:

        sudo apt-get install zlib1g-dev
      
  2. Write the C Program:

    • The program will use Zlib functions to compress or decompress files.

    • It takes three arguments: the input filename, the operation ('compress' or 'decompress'), and the output filename.

  3. Compile the Program with GCC:

    • Use the -lz flag to link against Zlib.

Let's break down each step in detail.

Step 1: Install Zlib Development Libraries

Run this command in the terminal:

sudo apt-get install zlib1g-dev

This command installs the Zlib development libraries and headers necessary for compiling programs that use Zlib.

Step 2: Writing the C Program

Here is an example program that accomplishes the task:

#include <stdio.h>
#include <string.h>
#include <assert.h>
#include <zlib.h>

#define CHUNK 16384

void compressFile(FILE *source, FILE *dest) {
    int ret, flush;
    unsigned have;
    z_stream strm;
    unsigned char in[CHUNK];
    unsigned char out[CHUNK];

    // Initialize the zlib stream for compression
    strm.zalloc = Z_NULL;
    strm.zfree = Z_NULL;
    strm.opaque = Z_NULL;
    ret = deflateInit(&strm, Z_DEFAULT_COMPRESSION);
    if (ret != Z_OK) return;

    // Compress until end of file
    do {
        strm.avail_in = fread(in, 1, CHUNK, source);
        if (ferror(source)) {
            (void)deflateEnd(&strm);
            return;
        }
        flush = feof(source) ? Z_FINISH : Z_NO_FLUSH;
        strm.next_in = in;

        // Run deflate() on input until output buffer not full
        do {
            strm.avail_out = CHUNK;
            strm.next_out = out;
            ret = deflate(&strm, flush);
            assert(ret != Z_STREAM_ERROR);
            have = CHUNK - strm.avail_out;
            if (fwrite(out, 1, have, dest) != have || ferror(dest)) {
                (void)deflateEnd(&strm);
                return;
            }
        } while (strm.avail_out == 0);
        assert(strm.avail_in == 0);
    } while (flush != Z_FINISH);
    assert(ret == Z_STREAM_END);

    // Clean up
    (void)deflateEnd(&strm);
}

void decompressFile(FILE *source, FILE *dest) {
    int ret;
    unsigned have;
    z_stream strm;
    unsigned char in[CHUNK];
    unsigned char out[CHUNK];

    // Initialize the zlib stream for decompression
    strm.zalloc = Z_NULL;
    strm.zfree = Z_NULL;
    strm.opaque = Z_NULL;
    ret = inflateInit(&strm);
    if (ret != Z_OK) return;

    // Decompress until deflate stream ends or end of file
    do {
        strm.avail_in = fread(in, 1, CHUNK, source);
        if (ferror(source)) {
            (void)inflateEnd(&strm);
            return;
        }
        if (strm.avail_in == 0) break;
        strm.next_in = in;

        // Run inflate() on input until output buffer not full
        do {
            strm.avail_out = CHUNK;
            strm.next_out = out;
            ret = inflate(&strm, Z_NO_FLUSH);
            assert(ret != Z_STREAM_ERROR);
            switch (ret) {
                case Z_NEED_DICT:
                case Z_DATA_ERROR:
                case Z_MEM_ERROR:
                    (void)inflateEnd(&strm);
                    return;
            }
            have = CHUNK - strm.avail_out;
            if (fwrite(out, 1, have, dest) != have || ferror(dest)) {
                (void)inflateEnd(&strm);
                return;
            }
        } while (strm.avail_out == 0);

        // Done when inflate() says it's done
    } while (ret != Z_STREAM_END);

    // Clean up
    (void)inflateEnd(&strm);
}

int main(int argc, char **argv) {
    if (argc != 4) {
        printf("Usage: %s <input file> <compress|decompress> <output file>\n", argv[0]);
        return 1;
    }

    FILE *inFile = fopen(argv[1], "rb");
    FILE *outFile = fopen(argv[3

], "wb");

    if (inFile == NULL || outFile == NULL) {
        fprintf(stderr, "Could not open files\n");
        return 1;
    }

    if (strcmp(argv[2], "compress") == 0) {
        compressFile(inFile, outFile);
    } else if (strcmp(argv[2], "decompress") == 0) {
        decompressFile(inFile, outFile);
    } else {
        fprintf(stderr, "Invalid operation\n");
        return 1;
    }

    fclose(inFile);
    fclose(outFile);
    return 0;
}

This C program is designed to either compress or decompress a file using the Zlib library. It takes three command-line arguments: the input file name, the operation (either "compress" or "decompress"), and the output file name. Let's break down the code in detail:

Includes and Macro Definition

  • #include statements: Include standard headers and the Zlib header.

    • <stdio.h>: Standard input/output functions.

    • <string.h>: String handling functions.

    • <assert.h>: Provides the assert macro for debugging.

    • <zlib.h>: Zlib library for compression/decompression functions.

  • #define CHUNK 16384: Defines a macro for the size of the buffer used in compression/decompression. Here, it's set to 16,384 bytes.

Function: compressFile(FILE *source, FILE *dest)

This function compresses the data read from source and writes the compressed data to dest.

  • Local Variables:

    • z_stream strm: Struct used by Zlib to maintain compression state.

    • unsigned char in[CHUNK], out[CHUNK]: Buffers for input and output data.

    • int ret, flush: Control variables for the compression loop and return status.

    • unsigned have: The number of bytes obtained after compression.

  • Initialization:

    • Initializes the z_stream and checks if deflateInit was successful.
  • Compression Loop:

    • Reads data from source and checks for file errors.

    • Sets flush based on whether the end of the file is reached.

    • Compresses the data in in buffer and writes it to out buffer.

    • Continues until all data is compressed (flush is Z_FINISH).

Function: decompressFile(FILE *source, FILE *dest)

This function decompresses data from source and writes the decompressed data to dest.

  • Local Variables: Similar to compressFile, but for decompression.

  • Initialization:

    • Initializes the z_stream for decompression and checks if inflateInit was successful.
  • Decompression Loop:

    • Reads and checks for errors similarly.

    • Decompresses data and writes to the output file.

    • Handles different return statuses from inflate.

Function: main(int argc, char **argv)

This is the entry point of the program.

  • Argument Check:

    • Checks if the program received exactly 4 arguments (including the program name).
  • File Operations:

    • Opens the input and output files in binary mode.

    • Checks for file opening errors.

  • Operation Selection:

    • Compares the second argument to decide whether to compress or decompress.

    • Calls compressFile or decompressFile accordingly.

  • Cleanup:

    • Closes the input and output files.

Flow of the Program

  1. Start: The program starts in main, parsing the command-line arguments.

  2. File Handling: Opens the source and destination files.

  3. Operation Execution: Based on the user's choice, it either compresses or decompresses the file.

  4. Completion: Closes the files and ends the program.

Error Handling

  • The program checks for file opening errors and reports if either the source or destination files cannot be opened.

  • During compression and decompression, it also checks for errors related to file reading/writing and Zlib operations.

Zlib Specifics

  • deflateInit and inflateInit: Initialize compression and decompression streams.

  • deflate and inflate: Functions for compressing and decompressing data.

  • The use of assert ensures that the program halts if there's an unexpected Zlib error, which is useful for debugging.

Important Considerations

  • The program assumes binary mode for files, making it suitable for any file type (not just text).

  • Error handling is basic and might need enhancement for robust applications.

  • The CHUNK size is a trade-off between memory usage and efficiency.

Step 3: Compile the Program with GCC

Use this command in the terminal:

gcc -o myprogram myprogram.c -lz

Replace myprogram with your desired executable name and myprogram.c with the name of your source file. The -lz flag links your program with the Zlib library.

More from this blog

Jyotiprakash's Blog

251 posts

I'm Jyotiprakash, a software dev and professor at KIIT, with expertise in system programming.