De Groen, R.
, & Garnock-Jones, T. (2025).
Fine-grained Fault Tolerance in Distributed Training Toolkits using the Syndicated Actor Model. In
MIND 2025 - Proceedings of the 1st International Workshop on Next-Gen Middleware for MLOps in Distributed Systems (pp. 1-6). Association for Computing Machinery, Inc.
https://doi.org/10.1145/3774902.3776630