1) The document discusses implementing and evaluating deep neural networks (DNNs) on mainstream heterogeneous systems like CPUs, GPUs, and APUs.
2) Preliminary results show that an APU achieves the highest performance per watt compared to CPUs and GPUs for DNN models like MLP and autoencoders.
3) Data transfers between the CPU and GPU are identified as a bottleneck, but APUs can help avoid this issue through efficient data sharing and zero-copy techniques between the CPU and GPU.