The document describes a proposed sorting accelerator for heterogeneous many-core systems. The accelerator uses a sorting network and merge sorter tree to sort data in parallel. It reads unsorted data from DRAM, processes it through the sorting network and merge sorter tree on an FPGA, and writes the sorted data back to DRAM. An example is provided of sorting 256 elements step-by-step through the sorting network and merge sorter tree to fully sort the data.