Heterogeneous systems composed by a CPU and a set of hardware accelerators have become one of the most common architectures today, thanks to their excellent performance and energy consumption. However, due to their heterogeneity they are very complex to program and even more to achieve performance portability on different devices. This paper presents EngineCL, a new OpenCL-based runtime system that notably simplifies the execution of a single massive data-parallel kernel on a heterogeneous system. It performs a set of low level tasks regarding the management of devices and their disjoint memory spaces. EngineCL has been validated in two different architectures with a set of devices. Experimental results show that it has excellent usability and a negligible overhead compared to the native version.