Parallel and Distributed Systems Group

Computer Science Department of Telecom SudParis

PhD defense: Alexis Colin – November, 28th – From trace collection to the prediction of the behaviour of parallel applications

Bonjour,J’ai le plaisir de vous inviter à ma soutenance de thèse intitulée “De la collecte de trace à la prédiction du comportement d’applications parallèles” [pdf]. Le résumé est disponible ci-dessous.

La soutenance aura lieu en français le lundi 28 novembre à 14h, dans l’amphithéâtre 3 des locaux de Télécom SudParis au 19 place Marguerite Perey, 91120 Palaiseau. Un accès en visioconférence sera disponible au lien suivant : https://webconf.imt.fr/frontend/fra-v2m-fsg-cuu.

Le jury sera composé de :
– Mme Amel Bouzeghoub, Professeure – Télécom SudParis (Examinatrice)
– M. Patrick Carribault, Chercheur – CEA/DAM (Examinateur)
– M. Denis Conan, Maître de conférences HDR – Télécom SudParis (Directeur de thèse)
– Mme Camille Coti, Professeure – Université du Québec à Montréal (Rapporteuse)
– M. Arnaud Legrand, Directeur de recherche – INRIA Grenoble (Examinateur)
– M. Samuel Thibault, Professeur – Université de Bordeaux (Rapporteur)
– M. François Trahay, Maître de conférences HDR – Télécom SudParis (Encadrant)

La soutenance sera suivie d’un pot.

Résumé :

Afin d’exploiter les ressources des serveurs et des supercalculateurs, les développeurs ont recours à des modèles de programmations spécifiques qui sont mis en œuvre par des runtimes dont le rôle est de permettre à chaque programme d’exploiter pleinement les capacités de la machine qui l’exécute. Pour cela, les runtimes doivent prendre des décisions qui ont un impact direct sur les performances. Pour prendre de bonnes décisions, les runtimes essaient d’anticiper le comportement futur des programmes, mais les moyens à leur disposition sont limités.

Nous présentons Pythia, un oracle générique permettant aux runtimes de prédire le comportement futur d’un programme. Nous décrivons comment enregistrer une trace d’exécution d’un programme pour en capturer la structure sous la forme d’une grammaire. Nous développons un algorithme performant capable de construire une telle grammaire à la volée pendant l’exécution d’un programme sans dégrader ses performances. Nous montrons ensuite comment utiliser une grammaire représentant la structure d’une exécution d’un programme pour prédire son comportement futur lors de ses exécutions ultérieures. Pythia permet en particulier d’explorer un arbre probabilisé des prochaines actions potentielles d’un programme.

L’évaluation de notre travail montre que les prédictions de Pythia peuvent être utilisées pour implémenter des optimisations au sein d’un runtime. Nous faisons aussi la démonstration de l’utilisabilité de Pythia en l’utilisant pour mettre en œuvre une stratégie de parallélisme adaptatif au sein d’un runtime OpenMP existant.

——————————————————-

[English]

Dear colleagues,

I have the pleasure to invite you to the defense of my PhD entitled “From trace collection to the prediction of the behaviour of parallel applications”. The abstract is below. The defense will take place in French on Monday, November 28 at 2:00 pm, in amphitheater 3 of the Télécom SudParis building at 19 place Marguerite Perey, 91120 Palaiseau. A videoconference access will be available at the following url: https://webconf.imt.fr/frontend/fra-v2m-fsg-cuu.

The jury will be composed of:
– Mrs. Amel Bouzeghoub, Professor – Télécom SudParis (Examiner)
– Mr. Patrick Carribault, Researcher – CEA/DAM (Examiner)
– Mr. Denis Conan, Associate Professor HDR – Télécom SudParis (Director)
– Mrs. Camille Coti, Professor – Université du Québec à Montréal (Reviewer)
– Mr. Arnaud Legrand, Research director – INRIA Grenoble (Examiner)
– Mr. Samuel Thibault, Professor – Université de Bordeaux (Reviewer)
– Mr. François Trahay, Associate Professor HDR – Télécom SudParis (Co-director)

The defense will be followed by a buffet.

Abstract:

In order to exploit the resources of servers and supercomputers, developers use specific programming models that are implemented by runtimes. Runtimes allow each program to fully exploit the capacities of the machine that executes it. To do this, runtimes take decisions that have a direct impact on the performance of the programs. In order to take good decisions, runtimes try to anticipate the future behavior of the programs, but the means at their disposal are limited.

We present Pythia, a generic oracle allowing runtimes to predict the future behavior of a program. We describe how to record an execution trace and to capture its structure in the form of a grammar. We develop an algorithm capable of building such a grammar on the fly during the execution of a program without degrading its performance. We then show how to use a grammar representing the structure of a program execution to predict its future behavior during its subsequent executions. In particular, Pythia allows to explore a probabilized tree of potential next actions of a program.

The evaluation of our work shows that the predictions of Pythia can be used to implement optimizations within a runtime. We have also demonstrated the usability of Pythia by using it to implement an adaptive parallelism strategy within an existing OpenMP runtime.

New paper “PYTHIA: an oracle to guide runtime system decisions” to be presented at Cluster’22

New paper “PYTHIA: an oracle to guide runtime system decisions” to be presented at Cluster’22.

Available online: https://hal.archives-ouvertes.fr/hal-03750441/document

Abstract

Runtime systems are commonly used by parallel applications in order to efficiently exploit the underlying hardware resources. A runtime system hides the complexity of the management of the hardware and exposes a high-level interface to application developers. To this end, it makes decisions by relying on heuristics that estimate the future behavior of the application. In this paper, we propose PYTHIA, a library that serves as an oracle capable of predicting the future behavior of an application, so that the runtime system can make more informed decisions. PYTHIA builds on the deterministic nature of many HPC applications: by  recording an execution trace, PYTHIA captures the application main behavior. The trace can be provided for future executions of the application, and a runtime system can ask for predictions of future program behavior. We evaluate PYTHIA on 13 MPI applications and show that PYTHIA can accurately predict the future of most of these applications, even when varying the problem size. We demonstrate how PYTHIA predictions can guide a runtime system optimization by implementing an adaptive thread parallelism strategy in GNU OpenMP runtime system. The evaluation shows that, thanks to PYTHIA prediction, the adaptive strategy reduces the execution time of an application by up to 38 %.

New paper “Why Globally Re-shuffle? Revisiting Data Shuffling in Large Scale Deep Learning” to be presented at IPDPS’22.

New paper “Why Globally Re-shuffle? Revisiting Data Shuffling in Large Scale Deep Learning” to be presented at IPDPS’22.

Available online: https://hal.archives-ouvertes.fr/hal-03599740/document

Abstract

Stochastic gradient descent (SGD) is the most prevalent algorithm for training Deep Neural Networks (DNN). SGD iterates the input data set in each training epoch processing data samples in a random access fashion. Because this puts enormous pressure on the I/O subsystem, the most common approach to distributed SGD in HPC environments is to replicate the entire dataset to node local SSDs. However, due to rapidly growing data set sizes this approach has become increasingly infeasible. Surprisingly, the questions of why and to what extent random access is required have not received a lot of attention in the literature from an empirical standpoint.

In this paper, we revisit data shuffling in DL workloads to investigate the viability of partitioning the dataset among workers and performing only a partial distributed exchange of samples in each training epoch. Through extensive experiments on up to 2,048 GPUs of ABCI and 4,096 compute nodes of Fugaku, we demonstrate that in practice validation accuracy of global shuffling can be maintained when carefully tuning the partial distributed exchange. We provide a solution implemented in PyTorch that enables users to control the proposed data exchange scheme.

New paper “J-NVM: Off-heap Persistent Objects in Java” to be presented at SOSP’21

New paper “J-NVM: Off-heap Persistent Objects in Java” to be presented at SOSP’21. Congrats to Anatole, Yohan, Kwabena, Pierre and Gaël!

New paper “Montsalvat: Intel SGX Shielding for GraalVM Native Images” to be presented at Middleware’21

New paper “Montsalvat: Intel SGX Shielding for GraalVM Native Images” to be presented at Middleware’21. Congrats to Gaël!

New paper “The Serverless Shell” to be presented at Middleware’21

New paper “The Serverless Shell” to be presented at Middleware’21. Congrats to Aurele and Pierre!

New paper “Highly-available and consistent group collaboration at the edge with Colony” to be presented at Middleware’21

New paper “Highly-available and consistent group collaboration at the edge with Colony” to be presented at Middleware’21. Congrats to Pierre!

New paper “Efficient Replication via Timestamp Stability” to be presented at Eurosys’21

New paper “Efficient Replication via Timestamp Stability” to be presented at Eurosys’21. Congrats to Pierre!

New paper “FaaSCache: an opportunistic free caching system for FaaS platforms” to be presented at Eurosys’21

New paper “FaaSCache: an opportunistic free caching system for FaaS platforms” to be presented at Eurosys’21. Congrats to Mathieu!

New paper “EZIOTracer: Unifying Kernel and User Space I/O Tracing for Data-Intensive Applications” to be presented at the CHEOPS workshop of Eurosys’21

New paper “EZIOTracer: Unifying Kernel and User Space I/O Tracing for Data-Intensive Applications” to be presented at the CHEOPS workshop of Eurosys’21. Congrats to Alexis C and François!

New paper “NVCache: A Plug-and-Play NVMM-based I/O booster for Legacy Systems” to be presented at DSN’21

New paper “NVCache: A Plug-and-Play NVMM-based I/O booster for Legacy Systems” to be presented at DSN’21. Congrats to Rémi and Gaël!

New paper “Transparent Overlapping of Blocking Communication in MPI Applications” to be presented at IEEE HPCC’20

New paper “Transparent Overlapping of Blocking Communication in MPI Applications” to be presented at IEEE HPCC’20. Congrat to Alexis, Elisabeth, François and Gaël!

New paper “Leaderless State-Machine Replication: Specification, Properties, Limits” to be presented at DISC’20

New paper “Leaderless State-Machine Replication: Specification, Properties, Limits” to be presented at Eurosys’20. Congrat to Pierre and Tuanir!

Muktikanta Sa joined the PDS group as postdoc

Muktikanta Sa joined the PDS group as postdoc, welcome!

Mathieu Bacou joined the PDS group as associate professor

Mathieu Bacou joined the PDS group as associate professor, welcome!

Aurèle Maheo joined the PDS group as postdoc

Aurèle Maheo joined the PDS group as postdoc, welcome!

New paper “State-Machine Replication for Planet-Scale Systems” to be presented at Eurosys’20

New paper “State-Machine Replication for Planet-Scale Systems” to be presented at Eurosys’20. Congrat to Pierre and Tuanir!

New paper “Using differential execution analysis to identify thread interference”. To appear in IEEE Transactions on Parallel and Distributed Systems

Abstract Understanding the performance of a multi-threaded application is difficult. The threads interfere when they access the same shared resource, which slows down their execution. Unfortunately, current profiling tools report the hardware components or the synchronization primitives that saturate, but they cannot tell if the saturation is the cause of a performance bottleneck. In this paper, we propose a holistic metric able to pinpoint the blocks of code that suffer interference the most, regardless of the interference cause. Our metric uses performance variation as a universal indicator of interference problems. With an evaluation of 27 applications we show that our metric can identify interference problems caused by 6 different kinds of interference in 9 applications. We are able to easily remove 7 of the bottlenecks, which leads to a performance improvement of up to 9 times

https://hal.archives-ouvertes.fr/hal-02179717v1

New paper “ScalOMP: analyzing the Scalability of OpenMP applications” to be presented at IWOMP’19

Anton Daumen will present his work “ScalOMP: analyzing the Scalability of OpenMP applications” at IWOMP’19. 

His paper is available online: https://hal.archives-ouvertes.fr/hal-02179726

Abstract : Achieving good scalability from parallel codes is becoming increasingly difficult due to the hardware becoming more and more complex. Performance tools help developers but their use is sometimes complicated and very iterative. In this paper we propose a simple methodology for assessing the scalability and for detecting performance problems in an OpenMP application. This methodology is implemented in a performance analysis tool named ScalOMP that relies on the capabilities of OMPT for analyzing OpenMP applications. ScalOMP reports the code regions with scalability issues and suggests optimization strategies for those issues. The evaluation shows that ScalOMP incurs low overhead and that its suggestions lead to significant performance improvement of several OpenMP applications
 

Yohan Pipereau, Damien Thenot, and Boubacar Kane joined the PDS group as PhD students

Yohan Pipereau, Damien Thenot, and Boubacar Kane joined the PDS group as PhD students. Welcome!