Abstract: The design of hardware-friendly architectures with low computational overhead is desirable for low latency realization of CNN on resource-constrained embedded platforms. In this work, we ...