Over the past few days, I worked on porting the PAQ8PX compressor for ARM CPUs. Initially, it would not compile because it relied on some SSE instructions that are not available on ARM. This required porting some code to equivalent functions in this architecture. Once this was done, paq8px happily compiled and ran on the Snapdragon 845 and 865 CPU’s from my Samsung Galaxy S9+ and S20 Ultra smartphones:
I tested it using the Termux app running Ubuntu from the AndroNix project.
The changes that were done to make it compile mostly involved adding some
#elif directives. These were added below the initial
#if directives that checked if we were compiling on i386 or x86_64 architectures. This
#elif directive checks to see if the platform is ARM. The reason is that the header
immintrin.h doesn’t exist on ARM. Instead, we need to include
arm_neon.h and make the appropriate changes to convert the SSE/SSE2 instructions to NEON.
There were 2 main files that had SSE2/SSSE3 instructions that needed to be ported to NEON. In reality, these ports were not needed, as PAQ8PX contains code that doesn’t depends on the instruction set, but porting it makes the software take advantage of the CPU to its potential.
I started by doing the changed to the
Mixer.hpp file, which had code for SSE2 and AVX2.
After porting the code, you can see 2 new functions along with some SSE2 to NEON helper functions were added:
MixerFactory.cpp needed some additional
else if and some additional conditions in order to support the NEON code:
The other file that contained SSSE3 instructions and needed to be ported to NEON was
Bucket.hpp. I also did a mayor refactoring to let the code which SIMD to use according to the system spec or the user-specified SIMD to use:
Ported the SSSE3 code to AVX2:
The SSSE3 function is now called
And here we have the NEON code:
If no SIMD is specified or detected, the code will run
Finally, here’s the
find function. It was extended to accept a second argument indicating which SIMD to use:
This change required adding this second argument to the corresponding lines of
This file required adding some
Then, I’m returning the value 11 if we have compiled paq8px on an ARM processor:
paq8px.cpp file was updated to accept the NEON argument when using the
And those were the main changes done to paq8px.
Currently, the latest version v187, which completes porting the SSE2/SSSE3 code to NEON, and fixes some bugs from previous versions.
You can follow paq8px development on the encode.su forum by clicking here.