File compression and decompression in C using ZLib
I am Jyotiprakash, a deeply driven computer systems engineer, software developer, teacher, and philosopher. With a decade of professional experience, I have contributed to various cutting-edge software products in network security, mobile apps, and healthcare software at renowned companies like Oracle, Yahoo, and Epic. My academic journey has taken me to prestigious institutions such as the University of Wisconsin-Madison and BITS Pilani in India, where I consistently ranked among the top of my class.
At my core, I am a computer enthusiast with a profound interest in understanding the intricacies of computer programming. My skills are not limited to application programming in Java; I have also delved deeply into computer hardware, learning about various architectures, low-level assembly programming, Linux kernel implementation, and writing device drivers. The contributions of Linus Torvalds, Ken Thompson, and Dennis Ritchie—who revolutionized the computer industry—inspire me. I believe that real contributions to computer science are made by mastering all levels of abstraction and understanding systems inside out.
In addition to my professional pursuits, I am passionate about teaching and sharing knowledge. I have spent two years as a teaching assistant at UW Madison, where I taught complex concepts in operating systems, computer graphics, and data structures to both graduate and undergraduate students. Currently, I am an assistant professor at KIIT, Bhubaneswar, where I continue to teach computer science to undergraduate and graduate students. I am also working on writing a few free books on systems programming, as I believe in freely sharing knowledge to empower others.
To create a C program that uses Zlib to compress or decompress a file based on command-line arguments, you need to follow these steps:
Install Zlib for Development on Ubuntu:
Open a terminal and run the following command to install Zlib development libraries:
sudo apt-get install zlib1g-dev
Write the C Program:
The program will use Zlib functions to compress or decompress files.
It takes three arguments: the input filename, the operation ('compress' or 'decompress'), and the output filename.
Compile the Program with GCC:
- Use the
-lzflag to link against Zlib.
- Use the
Let's break down each step in detail.
Step 1: Install Zlib Development Libraries
Run this command in the terminal:
sudo apt-get install zlib1g-dev
This command installs the Zlib development libraries and headers necessary for compiling programs that use Zlib.
Step 2: Writing the C Program
Here is an example program that accomplishes the task:
#include <stdio.h>
#include <string.h>
#include <assert.h>
#include <zlib.h>
#define CHUNK 16384
void compressFile(FILE *source, FILE *dest) {
int ret, flush;
unsigned have;
z_stream strm;
unsigned char in[CHUNK];
unsigned char out[CHUNK];
// Initialize the zlib stream for compression
strm.zalloc = Z_NULL;
strm.zfree = Z_NULL;
strm.opaque = Z_NULL;
ret = deflateInit(&strm, Z_DEFAULT_COMPRESSION);
if (ret != Z_OK) return;
// Compress until end of file
do {
strm.avail_in = fread(in, 1, CHUNK, source);
if (ferror(source)) {
(void)deflateEnd(&strm);
return;
}
flush = feof(source) ? Z_FINISH : Z_NO_FLUSH;
strm.next_in = in;
// Run deflate() on input until output buffer not full
do {
strm.avail_out = CHUNK;
strm.next_out = out;
ret = deflate(&strm, flush);
assert(ret != Z_STREAM_ERROR);
have = CHUNK - strm.avail_out;
if (fwrite(out, 1, have, dest) != have || ferror(dest)) {
(void)deflateEnd(&strm);
return;
}
} while (strm.avail_out == 0);
assert(strm.avail_in == 0);
} while (flush != Z_FINISH);
assert(ret == Z_STREAM_END);
// Clean up
(void)deflateEnd(&strm);
}
void decompressFile(FILE *source, FILE *dest) {
int ret;
unsigned have;
z_stream strm;
unsigned char in[CHUNK];
unsigned char out[CHUNK];
// Initialize the zlib stream for decompression
strm.zalloc = Z_NULL;
strm.zfree = Z_NULL;
strm.opaque = Z_NULL;
ret = inflateInit(&strm);
if (ret != Z_OK) return;
// Decompress until deflate stream ends or end of file
do {
strm.avail_in = fread(in, 1, CHUNK, source);
if (ferror(source)) {
(void)inflateEnd(&strm);
return;
}
if (strm.avail_in == 0) break;
strm.next_in = in;
// Run inflate() on input until output buffer not full
do {
strm.avail_out = CHUNK;
strm.next_out = out;
ret = inflate(&strm, Z_NO_FLUSH);
assert(ret != Z_STREAM_ERROR);
switch (ret) {
case Z_NEED_DICT:
case Z_DATA_ERROR:
case Z_MEM_ERROR:
(void)inflateEnd(&strm);
return;
}
have = CHUNK - strm.avail_out;
if (fwrite(out, 1, have, dest) != have || ferror(dest)) {
(void)inflateEnd(&strm);
return;
}
} while (strm.avail_out == 0);
// Done when inflate() says it's done
} while (ret != Z_STREAM_END);
// Clean up
(void)inflateEnd(&strm);
}
int main(int argc, char **argv) {
if (argc != 4) {
printf("Usage: %s <input file> <compress|decompress> <output file>\n", argv[0]);
return 1;
}
FILE *inFile = fopen(argv[1], "rb");
FILE *outFile = fopen(argv[3
], "wb");
if (inFile == NULL || outFile == NULL) {
fprintf(stderr, "Could not open files\n");
return 1;
}
if (strcmp(argv[2], "compress") == 0) {
compressFile(inFile, outFile);
} else if (strcmp(argv[2], "decompress") == 0) {
decompressFile(inFile, outFile);
} else {
fprintf(stderr, "Invalid operation\n");
return 1;
}
fclose(inFile);
fclose(outFile);
return 0;
}
This C program is designed to either compress or decompress a file using the Zlib library. It takes three command-line arguments: the input file name, the operation (either "compress" or "decompress"), and the output file name. Let's break down the code in detail:
Includes and Macro Definition
#includestatements: Include standard headers and the Zlib header.<stdio.h>: Standard input/output functions.<string.h>: String handling functions.<assert.h>: Provides theassertmacro for debugging.<zlib.h>: Zlib library for compression/decompression functions.
#define CHUNK 16384: Defines a macro for the size of the buffer used in compression/decompression. Here, it's set to 16,384 bytes.
Function: compressFile(FILE *source, FILE *dest)
This function compresses the data read from source and writes the compressed data to dest.
Local Variables:
z_stream strm: Struct used by Zlib to maintain compression state.unsigned char in[CHUNK], out[CHUNK]: Buffers for input and output data.int ret, flush: Control variables for the compression loop and return status.unsigned have: The number of bytes obtained after compression.
Initialization:
- Initializes the
z_streamand checks ifdeflateInitwas successful.
- Initializes the
Compression Loop:
Reads data from
sourceand checks for file errors.Sets
flushbased on whether the end of the file is reached.Compresses the data in
inbuffer and writes it tooutbuffer.Continues until all data is compressed (
flushisZ_FINISH).
Function: decompressFile(FILE *source, FILE *dest)
This function decompresses data from source and writes the decompressed data to dest.
Local Variables: Similar to
compressFile, but for decompression.Initialization:
- Initializes the
z_streamfor decompression and checks ifinflateInitwas successful.
- Initializes the
Decompression Loop:
Reads and checks for errors similarly.
Decompresses data and writes to the output file.
Handles different return statuses from
inflate.
Function: main(int argc, char **argv)
This is the entry point of the program.
Argument Check:
- Checks if the program received exactly 4 arguments (including the program name).
File Operations:
Opens the input and output files in binary mode.
Checks for file opening errors.
Operation Selection:
Compares the second argument to decide whether to compress or decompress.
Calls
compressFileordecompressFileaccordingly.
Cleanup:
- Closes the input and output files.
Flow of the Program
Start: The program starts in
main, parsing the command-line arguments.File Handling: Opens the source and destination files.
Operation Execution: Based on the user's choice, it either compresses or decompresses the file.
Completion: Closes the files and ends the program.
Error Handling
The program checks for file opening errors and reports if either the source or destination files cannot be opened.
During compression and decompression, it also checks for errors related to file reading/writing and Zlib operations.
Zlib Specifics
deflateInitandinflateInit: Initialize compression and decompression streams.deflateandinflate: Functions for compressing and decompressing data.The use of
assertensures that the program halts if there's an unexpected Zlib error, which is useful for debugging.
Important Considerations
The program assumes binary mode for files, making it suitable for any file type (not just text).
Error handling is basic and might need enhancement for robust applications.
The
CHUNKsize is a trade-off between memory usage and efficiency.
Step 3: Compile the Program with GCC
Use this command in the terminal:
gcc -o myprogram myprogram.c -lz
Replace myprogram with your desired executable name and myprogram.c with the name of your source file. The -lz flag links your program with the Zlib library.