Computational fluid and particle dynamics simulations (CFPD) are of paramount importance for studying and improving drug effectiveness. Computational requirements of CFPD codes involves high-performance computing (HPC) resources. For these reasons we introduce and evaluate in this paper system software techniques for improving performance and tolerate load imbalance on a state-of-the-art production CFPD code. We demonstrate benefits of these techniques on both Intel- and Arm-based HPC clusters showing the importance of using mechanisms applied at runtime to improve the performance independently of the underlying architecture. We run a real CFPD simulation of particle tracking on the human respiratory system, showing performance improvements of up to 2X, keeping the computational resources constant.