Ongoing Updates to the paq8px_script.py script to compress files using PAQ8PX

Ongoing Updates to the paq8px_script.py script to compress files using PAQ8PX

Hi everyone,

Over the past few days, I’ve revisited my old paq8px_script.py Python script that I wrote in order to compress files using the paq8px compressor. This script builds a file list that we can feed to paq8px so that it can compress multiple files and create an archive. I wrote an introductory post of this project back in 2018.

An update I did in 2020 was to add multithreaded compression by making a paq8px archive per file. This is useful if we have a bunch of files and a computer with high CPU threads. This code worked, but I didn’t tested it thoroughly and never did an official post about this.

Now that I’ve revisited this script, I have updated it and tested it on Ubuntu Linux 22.10. There are a few improvements done to this:

New Changes

The following changes has been made to the script since its introduction:

  • -n or --nativecpu: Use a “Native CPU” paq8px executable. I always name these with the postfix _nativecpu. This switch will use that build rather than the normal paq8px build. These works best on AVX2 machines. They may crash on CPUs lacking this instruction set. If using the script on non-AVX2 CPUs, do not use this switch.
  • -mt or --multithread: Compress one file per CPU thread. This will compress files individually rather than adding them all to a single archive. If you prefer the latter, do not use this switch.
  • -to or --test-only: Tests an already-compressed archive, skipping compression. The default action of this script is to compress only. You can perform testing by supplying the -t flag, but if you just want to test an already existing archive, you only need to pass the -to flag.
  • paq8px v207 is used as the default version.
  • The level argument by default is -9.

Next steps

The next step in this script is to add support for extracting archives, including multithreaded extraction. The script is also lacking multithreaded verification. I will also be working on this in the future.

The script has already been refactored and splitted into functions, which makes it easier to manage and update. It is also more readable now.

Ever since switching to Linux, I’ve been using Python more often to create scripts to help me with my tasks, and this piece of code is essential to my data compression and archiving needs.

Script Usage and Arguments as of this post

usage: paq8px_script.py [-h] -i INPUT [-v VERSION] [-l LEVEL] [-o OUTPUT] [-t] [-to] [-r] [-mt]
                        [-n]

This script will generate a filelist file which will be used by paq8px_v207 for compressing. It is
also used for testing if you use the -t or -to argument

options:
  -h, --help            show this help message and exit

required arguments:
  -i INPUT, --input INPUT
                        Input file or folder to compress. REQUIRED

optional arguments:
  -v VERSION, --version VERSION
                        Version of PAQ8PX to use. Example: 207. Default is 207
  -l LEVEL, --level LEVEL
                        Compression level and switches. Example: 9a to compress using level 9 and
                        with the 'Adaptive learning rate' switch. Default is 9
  -o OUTPUT, --output OUTPUT
                        Output file to use. If not used, the archive will be saved at the root of
                        the parent folder where the file/folder to compress is located. Do not
                        provide extension
  -t, --test            Optional flag to test the archive after compressing it. It is recommended
                        to use this option. Default is not to test
  -to, --test-only      Skip compression and just test the archive.
  -r, --remove          Deletes the filelist text file. Not recommended unless you plan not to
                        test the archive later. Default is not to remove
  -mt, --multithread    Compresses each file on a separate thread. This creates individual
                        archives with just one file
  -n, --nativecpu       Use the native CPU version. These versions usually ends with _nativecpu
                        and may provide performane improvements on your machine over the generic
                        version

Downloads

paq8px script repository on GitHub.

Sample Script Output

Below is an example output of the script in its current state, where I compressed the images used in my post about creating QR codes in Linux:

python paq8px_script.py -i "/home/moisespr123/Pictures/Generating QR codes using qrencode on Linux" -o "/home/moisespr123/Pictures/Generating QR codes using qrencode on Linux/screenshots" -n -t -r
Listing files to compress and test
1 - Installing qrencode.png
2 - qrencode installed.png
3 - Creating a QR code and displaying on the screen.png
4 - QR code shown on screen with imagemagick's display tool.png
5 - Saving a QR code as a PNG image.png
6 - QR code saved.png
7 - Install imagemagick.png
Writing filelist.txt

Starting compression...

./paq8px_v207_nativecpu -9 "@/home/moisespr123/Pictures/Generating QR codes using qrencode on Linux/Generating QR codes using qrencode on Linux.txt" "/home/moisespr123/Pictures/Generating QR codes using qrencode on Linux/screenshots.paq8px207"
paq8px archiver v207 (c) 2022, Matt Mahoney et al.

Creating archive /home/moisespr123/Pictures/Generating QR codes using qrencode on Linux/screenshots.paq8px207 in multiple file mode with 7 files...
1/2 - Filename of listfile : 48 bytes
2/2 - Content of listfile  : 267 bytes
----- Compressed to        : 143 bytes

1/7 - Filename: /home/moisespr123/Pictures/Generating QR codes using qrencode on Linux/1 - Installing qrencode.png (18928 bytes)
Block segmentation:
 0           | default          |     18928 bytes [0 - 18927]
File input size       : 18928
File compressed size  : 17077

2/7 - Filename: /home/moisespr123/Pictures/Generating QR codes using qrencode on Linux/2 - qrencode installed.png (78253 bytes)
Block segmentation:
 0           | default          |     78253 bytes [0 - 78252]
File input size       : 78253
File compressed size  : 72124

3/7 - Filename: /home/moisespr123/Pictures/Generating QR codes using qrencode on Linux/3 - Creating a QR code and displaying on the screen.png (54622 bytes)
Block segmentation:
 0           | default          |     54622 bytes [0 - 54621]
File input size       : 54622
File compressed size  : 52977

4/7 - Filename: /home/moisespr123/Pictures/Generating QR codes using qrencode on Linux/4 - QR code shown on screen with imagemagick's display tool.png (21197 bytes)
Block segmentation:
 0           | default          |     21197 bytes [0 - 21196]
File input size       : 21197
File compressed size  : 18950

5/7 - Filename: /home/moisespr123/Pictures/Generating QR codes using qrencode on Linux/5 - Saving a QR code as a PNG image.png (21843 bytes)
Block segmentation:
 0           | default          |     21843 bytes [0 - 21842]
File input size       : 21843
File compressed size  : 19512

6/7 - Filename: /home/moisespr123/Pictures/Generating QR codes using qrencode on Linux/6 - QR code saved.png (25529 bytes)
Block segmentation:
 0           | default          |     25529 bytes [0 - 25528]
File input size       : 25529
File compressed size  : 22719

7/7 - Filename: /home/moisespr123/Pictures/Generating QR codes using qrencode on Linux/7 - Install imagemagick.png (36709 bytes)
Block segmentation:
 0           | default          |     36709 bytes [0 - 36708]
File input size       : 36709
File compressed size  : 33347
-----------------------
Total input size     : 257081
Total archive size   : 236858

Time 57.74 sec, used 4348 MB (4560025975 bytes) of memory

Verifying archive...

./paq8px_v207_nativecpu -t "/home/moisespr123/Pictures/Generating QR codes using qrencode on Linux/screenshots.paq8px207" "/home/moisespr123/Pictures/Generating QR codes using qrencode on Linux"
paq8px archiver v207 (c) 2022, Matt Mahoney et al.

Comparing /home/moisespr123/Pictures/Generating QR codes using qrencode on Linux/1 - Installing qrencode.png 18928 bytes -> identical
Comparing /home/moisespr123/Pictures/Generating QR codes using qrencode on Linux/2 - qrencode installed.png 78253 bytes -> identical
Comparing /home/moisespr123/Pictures/Generating QR codes using qrencode on Linux/3 - Creating a QR code and displaying on the screen.png 54622 bytes -> identical
Comparing /home/moisespr123/Pictures/Generating QR codes using qrencode on Linux/4 - QR code shown on screen with imagemagick's display tool.png 21197 bytes -> identical
Comparing /home/moisespr123/Pictures/Generating QR codes using qrencode on Linux/5 - Saving a QR code as a PNG image.png 21843 bytes -> identical
Comparing /home/moisespr123/Pictures/Generating QR codes using qrencode on Linux/6 - QR code saved.png 25529 bytes -> identical
Comparing /home/moisespr123/Pictures/Generating QR codes using qrencode on Linux/7 - Install imagemagick.png 36709 bytes -> identical
Time 56.33 sec, used 4348 MB (4560021823 bytes) of memory

Removing the filelist file

Compression and testing finished!