Gzip ( .gz ) is a widely used, open-source algorithm and file format developed in 1992 by Jean-loup Gailly and Mark Adler to replace proprietary compression tools. It is the standard for web compression and is frequently used to shrink large, text-heavy files, such as CSVs, to save storage space and increase transfer speeds.
No data is lost; decompressing restores the exact original file.
Here is a deep dive into how gzip works, its applications, and how to use it. How Gzip Works (The Technical Mechanism) cskvdhdgzip
Gzip is heavily integrated into modern data science workflows. Compressing/Decompressing with gzip Module
Used for end-to-end compression (server-to-browser) to speed up website load times. Gzip (
import gzip import shutil # Compress with open('data.csv', 'rb') as f_in: with gzip.open('data.csv.gz', 'wb') as f_out: shutil.copyfileobj(f_in, f_out) # Decompress with gzip.open('data.csv.gz', 'rb') as f_in: with open('data_restored.csv', 'wb') as f_out: shutil.copyfileobj(f_in, f_out) Use code with caution. Copied to clipboard Working with Pandas
Pandas can directly read and write compressed files, making it convenient for large datasets. Here is a deep dive into how gzip
The algorithm scans data to find repeating patterns. Instead of storing the repeated data twice, it replaces subsequent occurrences with a pointer (a pair of numbers: distance and length) to the initial occurrence.