Computational science and complex system administration relies on being able to model user interactions. When it comes to managing HPC, HTC and grid systems user workloads - their job submission behaviour, is an important metric when designing systems or scheduling algorithms. Most simulators are either inflexible or tied in to proprietary scheduling systems. For system administrators being able to model how a scheduling algorithm behaves or how modifying system configurations can affect the job completion rates is critical. Within computer science research many algorithms are presented with no real description or verification of behaviour. In this paper we are presenting the Cluster Discrete Event Simulator (CDES) as an strong candidate for HPC workload simulation. Built around an open framework, CDES can take system definitions, multi-platform real usage logs and can be interfaced with any scheduling algorithm through the use of an API. CDES has been tested against 3 years of usage logs from a production level HPC system and verified to a greater than 95% accuracy.
|Title of host publication
|Proceedings - IEEE International Symposium on Distributed Simulation and Real-Time Applications, DS-RT
|Institute of Electrical and Electronics Engineers Inc.
|Number of pages
|Published - 13 Nov 2014
|18th IEEE/ACM International Symposium on Distributed Simulations and Real Time Applications - Aeronautics and Space Institute, Toulouse, France
Duration: 1 Oct 2014 → 3 Oct 2014
Conference number: 18
http://ds-rt.com/2014/ (Link to Conference Website)
|18th IEEE/ACM International Symposium on Distributed Simulations and Real Time Applications
|1/10/14 → 3/10/14