File compression and decompression in C using ZLib

To create a C program that uses Zlib to compress or decompress a file based on command-line arguments, you need to follow these steps:

  1. Install Zlib for Development on Ubuntu:

    • Open a terminal and run the following command to install Zlib development libraries:

        sudo apt-get install zlib1g-dev
      
  2. Write the C Program:

    • The program will use Zlib functions to compress or decompress files.

    • It takes three arguments: the input filename, the operation ('compress' or 'decompress'), and the output filename.

  3. Compile the Program with GCC:

    • Use the -lz flag to link against Zlib.

Let's break down each step in detail.

Step 1: Install Zlib Development Libraries

Run this command in the terminal:

sudo apt-get install zlib1g-dev

This command installs the Zlib development libraries and headers necessary for compiling programs that use Zlib.

Step 2: Writing the C Program

Here is an example program that accomplishes the task:

#include <stdio.h>
#include <string.h>
#include <assert.h>
#include <zlib.h>

#define CHUNK 16384

void compressFile(FILE *source, FILE *dest) {
    int ret, flush;
    unsigned have;
    z_stream strm;
    unsigned char in[CHUNK];
    unsigned char out[CHUNK];

    // Initialize the zlib stream for compression
    strm.zalloc = Z_NULL;
    strm.zfree = Z_NULL;
    strm.opaque = Z_NULL;
    ret = deflateInit(&strm, Z_DEFAULT_COMPRESSION);
    if (ret != Z_OK) return;

    // Compress until end of file
    do {
        strm.avail_in = fread(in, 1, CHUNK, source);
        if (ferror(source)) {
            (void)deflateEnd(&strm);
            return;
        }
        flush = feof(source) ? Z_FINISH : Z_NO_FLUSH;
        strm.next_in = in;

        // Run deflate() on input until output buffer not full
        do {
            strm.avail_out = CHUNK;
            strm.next_out = out;
            ret = deflate(&strm, flush);
            assert(ret != Z_STREAM_ERROR);
            have = CHUNK - strm.avail_out;
            if (fwrite(out, 1, have, dest) != have || ferror(dest)) {
                (void)deflateEnd(&strm);
                return;
            }
        } while (strm.avail_out == 0);
        assert(strm.avail_in == 0);
    } while (flush != Z_FINISH);
    assert(ret == Z_STREAM_END);

    // Clean up
    (void)deflateEnd(&strm);
}

void decompressFile(FILE *source, FILE *dest) {
    int ret;
    unsigned have;
    z_stream strm;
    unsigned char in[CHUNK];
    unsigned char out[CHUNK];

    // Initialize the zlib stream for decompression
    strm.zalloc = Z_NULL;
    strm.zfree = Z_NULL;
    strm.opaque = Z_NULL;
    ret = inflateInit(&strm);
    if (ret != Z_OK) return;

    // Decompress until deflate stream ends or end of file
    do {
        strm.avail_in = fread(in, 1, CHUNK, source);
        if (ferror(source)) {
            (void)inflateEnd(&strm);
            return;
        }
        if (strm.avail_in == 0) break;
        strm.next_in = in;

        // Run inflate() on input until output buffer not full
        do {
            strm.avail_out = CHUNK;
            strm.next_out = out;
            ret = inflate(&strm, Z_NO_FLUSH);
            assert(ret != Z_STREAM_ERROR);
            switch (ret) {
                case Z_NEED_DICT:
                case Z_DATA_ERROR:
                case Z_MEM_ERROR:
                    (void)inflateEnd(&strm);
                    return;
            }
            have = CHUNK - strm.avail_out;
            if (fwrite(out, 1, have, dest) != have || ferror(dest)) {
                (void)inflateEnd(&strm);
                return;
            }
        } while (strm.avail_out == 0);

        // Done when inflate() says it's done
    } while (ret != Z_STREAM_END);

    // Clean up
    (void)inflateEnd(&strm);
}

int main(int argc, char **argv) {
    if (argc != 4) {
        printf("Usage: %s <input file> <compress|decompress> <output file>\n", argv[0]);
        return 1;
    }

    FILE *inFile = fopen(argv[1], "rb");
    FILE *outFile = fopen(argv[3

], "wb");

    if (inFile == NULL || outFile == NULL) {
        fprintf(stderr, "Could not open files\n");
        return 1;
    }

    if (strcmp(argv[2], "compress") == 0) {
        compressFile(inFile, outFile);
    } else if (strcmp(argv[2], "decompress") == 0) {
        decompressFile(inFile, outFile);
    } else {
        fprintf(stderr, "Invalid operation\n");
        return 1;
    }

    fclose(inFile);
    fclose(outFile);
    return 0;
}

This C program is designed to either compress or decompress a file using the Zlib library. It takes three command-line arguments: the input file name, the operation (either "compress" or "decompress"), and the output file name. Let's break down the code in detail:

Includes and Macro Definition

  • #include statements: Include standard headers and the Zlib header.

    • <stdio.h>: Standard input/output functions.

    • <string.h>: String handling functions.

    • <assert.h>: Provides the assert macro for debugging.

    • <zlib.h>: Zlib library for compression/decompression functions.

  • #define CHUNK 16384: Defines a macro for the size of the buffer used in compression/decompression. Here, it's set to 16,384 bytes.

Function: compressFile(FILE *source, FILE *dest)

This function compresses the data read from source and writes the compressed data to dest.

  • Local Variables:

    • z_stream strm: Struct used by Zlib to maintain compression state.

    • unsigned char in[CHUNK], out[CHUNK]: Buffers for input and output data.

    • int ret, flush: Control variables for the compression loop and return status.

    • unsigned have: The number of bytes obtained after compression.

  • Initialization:

    • Initializes the z_stream and checks if deflateInit was successful.
  • Compression Loop:

    • Reads data from source and checks for file errors.

    • Sets flush based on whether the end of the file is reached.

    • Compresses the data in in buffer and writes it to out buffer.

    • Continues until all data is compressed (flush is Z_FINISH).

Function: decompressFile(FILE *source, FILE *dest)

This function decompresses data from source and writes the decompressed data to dest.

  • Local Variables: Similar to compressFile, but for decompression.

  • Initialization:

    • Initializes the z_stream for decompression and checks if inflateInit was successful.
  • Decompression Loop:

    • Reads and checks for errors similarly.

    • Decompresses data and writes to the output file.

    • Handles different return statuses from inflate.

Function: main(int argc, char **argv)

This is the entry point of the program.

  • Argument Check:

    • Checks if the program received exactly 4 arguments (including the program name).
  • File Operations:

    • Opens the input and output files in binary mode.

    • Checks for file opening errors.

  • Operation Selection:

    • Compares the second argument to decide whether to compress or decompress.

    • Calls compressFile or decompressFile accordingly.

  • Cleanup:

    • Closes the input and output files.

Flow of the Program

  1. Start: The program starts in main, parsing the command-line arguments.

  2. File Handling: Opens the source and destination files.

  3. Operation Execution: Based on the user's choice, it either compresses or decompresses the file.

  4. Completion: Closes the files and ends the program.

Error Handling

  • The program checks for file opening errors and reports if either the source or destination files cannot be opened.

  • During compression and decompression, it also checks for errors related to file reading/writing and Zlib operations.

Zlib Specifics

  • deflateInit and inflateInit: Initialize compression and decompression streams.

  • deflate and inflate: Functions for compressing and decompressing data.

  • The use of assert ensures that the program halts if there's an unexpected Zlib error, which is useful for debugging.

Important Considerations

  • The program assumes binary mode for files, making it suitable for any file type (not just text).

  • Error handling is basic and might need enhancement for robust applications.

  • The CHUNK size is a trade-off between memory usage and efficiency.

Step 3: Compile the Program with GCC

Use this command in the terminal:

gcc -o myprogram myprogram.c -lz

Replace myprogram with your desired executable name and myprogram.c with the name of your source file. The -lz flag links your program with the Zlib library.