Posts

Showing posts with the label scheduling

Towards Optimal Transaction Scheduling

Image
This paper (VLDB'2024) looks at boosting transaction throughput through better scheduling. The idea is to explore the schedule-space more systematically and pick execution orders that reduce conflicts. The paper's two main contributions are a scheduling policy called Shortest Makespan First (SMF) and a MVTSO concurrency control variant called MVSchedO. SMF uses a greedy heuristic to pick low-conflict schedules that minimize the increase in total execution time (makespan) at each step. MVSchedO enforces the chosen schedule at a fine-grained level by adapting multi-version timestamp ordering (MVTSO). The authors implement SMF and MVSchedO in RocksDB (R-SMF) and show up to 3.9x higher throughput and 3.2x lower tail latency on benchmarks as well as Meta's TAO workload. I mean, this is a stellar systems work, and it makes a convincing case that search-based scheduling is a promising direction for extracting higher throughput from database systems. Motivation Reducing contention...

Paper summary. Decoupling the Control Plane from Program Control Flow for Flexibility and Performance in Cloud Computing

Image
This paper appeared in Eurosys 2018 and is authored by Hang Qu, Omid Mashayekhi, Chinmayee Shah, and Philip Levis from Stanford University. I liked the paper a lot, it is well written and presented. And I am getting lazy, so I use a lot of text from the paper in my summary below. Problem motivation  In data processing frameworks, improved parallelism is the holy grail because it can get more data processed in less time. However, parallelism has a nemesis called the control plane . While, control plane can have a wide array of meaning, in this paper control plane is defined as the systems and protocols for scheduling computations, load balancing, and recovering from failures. A centralized control frame becomes a bottleneck after a point. The paper cites other papers and states that a typical cloud framework control plane that uses a fully centralized design can dispatch fewer than 10,000 tasks per second. Actually, that is not bad! However, with machine learning (ML) app...

Two paper summaries on scheduling in heterogenous computing

Today, I have two short paper summaries from HotCloud'17 on heterogenous computing. Heterogeneous GPU reallocation This paper appeared in HOTCloud'17, and the authors are James Gleeson and Eyal de Lara, University of Toronto. It looks like they have developed a GPU virtualization tool called Crane recently . "General purpose GPU (GPGPU) computing in virtualized environments leverages PCI passthrough to achieve GPU performance comparable to bare-metal execution. However, GPU passthrough prevents service administrators from performing virtual machine migration between physical hosts. Crane is a new technique for virtualizing OpenCL-based GPGPU computing that achieves within 5.25% of passthrough GPU performance while supporting VM migration. Crane interposes a virtualization-aware OpenCL library that makes it possible to reclaim and subsequently reassign physical GPUs to a VM without terminating the guest or its applications. Crane also enables continued GPU operation...

Paper summary. Towards distributed machine learning in shared clusters: a dynamically partitioned approach

Image
This paper (by Peng Sun, Yonggang Wen, Ta Nguyen Binh Duong, and Shengen Yan) has been put on Arxiv on April 2017. This paper was a little confusing to read. I think it could have been presented better to make its contributions more clear. The paper aims to enable multiple distributed ML frameworks, say TensorFlow, Petuum, MxNet, share the same cluster. Enterprises have clusters, managed by  a cluster management systems (CMSs). The paper starts with a review of existing CMSs, and mentions shortcomings with each. It is unhappy with application-level scheduling, because there each application reserves and keeps all allocated resources until completion, and this leads to low utilization of the resources as the scheduling is done for the peak/maximum resource needs of the application. In the task-level scheduling mode, applications use acquired resources to run a single task, release them as soon as the task completes, and petition for new resources to launch uncompleted tasks...

Popular posts from this blog

Hints for Distributed Systems Design

My Time at MIT

Advice to the young

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Foundational distributed systems papers

Learning about distributed systems: where to start?

Distributed Transactions at Scale in Amazon DynamoDB

Making database systems usable

Looming Liability Machines (LLMs)

Analyzing Metastable Failures in Distributed Systems