Parallel and Distributed Systems Group

Computer Science Department of Telecom SudParis

New paper “KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training”, to be presented at NeurIPS’23

Available online: https://hal.archives-ouvertes.fr/hal-03750441/document

Code available at https://github.com/TruongThaoNguyen/kakurenbo

Authors: Thao Truong Nguyen, Balazs Gerofi, Edgar Josafat Martinez-Noriega, François Trahay, Mohamed Wahib.

Abstract: This paper proposes a method for hiding the least-important samples during the training of deep neural networks to increase efficiency, i.e., to reduce the cost of training. Using information about the loss and prediction confidence during training, we adaptively find samples to exclude in a given epoch based on their contribution to the overall learning process, without significantly degrading accuracy. We explore the converge properties when accounting for the reduction in the number of SGD updates. Empirical results on various large-scale datasets and models used directly in image classification and segmentation show that while the withreplacement importance sampling algorithm performs poorly on large datasets, our method can reduce total training time by up to 22% impacting accuracy only by 0.4% compared to the baseline.

PhD defense: Anatole Lefort, March 24th – A Support for Persistent Memory in Java

Hi,

It is my pleasure to invite you all to the defense of my Ph.D. thesis,
prepared at Télécom SudParis and supervised by Pierre Sutra & Gaël Thomas.

    When: Friday, March 24 @ 9:30 A.M.
    Title: “A Support for Persistent Memory in Java”
    Virtual link: https://webconf.imt.fr/frontend/gae-fgg-lwf-vgs

It will be held in hybrid format - all details are available below.

Hope to see you there!

Cheers,

Anatole Lefort


--- Full Details ---

WHO: Anatole Lefort, Télécom SudParis

WHEN: Friday, March 24 @ 9:30 A.M.

WHERE:

   Amphi 7, Télécom (Sud)Paris
   19 place Marguerite Perey, 91120 Palaiseau, France
   Accès: Gare Massy-Palaiseau (TGV, RER) -> Bus 91.06.

TITLE: “A Support for Persistent Memory in Java”

ABSTRACT:

Recently released non-volatile main memory (NVMM), as fast and durable
memory, dramatically increases storage performance over traditional
media (SSD, hard disk).
A substantial and unique property of NVMM is byte-addressability -
complex memory data structures, maintained with regular load/store
instructions, can now resist machine power-cycles, software faults or
system crashes.
However, correctly managing persistence with the fine grain of memory
instructions is laborious, with increased risk of compromising data
integrity and recovery at any misstep.
Programming abstractions from software libraries and support from
language runtime and compilers are necessary to avoid memory bugs that
are exacerbated with persistence.

In this thesis, we address the challenges of supporting persistent
memory in managed language environments by introducing J-NVM, a
framework to efficiently access NVMM in Java.
With J-NVM, we demonstrate how to design an efficient, simple and
complete interface to weave NVMM-devised persistence into
object-oriented programming, while remaining unobtrusive to the language
runtime itself.
In detail, J-NVM offers a fully-fledged interface to persist plain Java
objects using failure-atomic sections.
This interface relies internally on proxy objects that intermediate
direct off-heap access to NVMM.
The framework also provides a library of highly-optimized persistent
data types that resist reboots and power failures.
We evaluate J-NVM by implementing a persistent backend for Infinispan,
an industrial-grade data store.
Our experimental results, obtained with a TPC-B like benchmark and YCSB,
show that J-NVM is consistently faster than other approaches at
accessing NVMM in Java.

JURY MEMBERS:

Mr. Paolo ROMANO, Associate Professor, University of Lisbon - Reviewer
Mr. Vivien QUEMA, Full Professor, Grenoble INP/ENSIMAG - Reviewer
Ms. Panagiota FATOUROU, Full Professor, University of Crete - Examiner
Ms. Sara BOUCHENAK, Full Professor, INSA Lyon - Examiner
Mr. Pierre SUTRA, Associate Professor, Télécom SudParis - Advisor
Mr. Gaël THOMAS, Full Professor, Télécom SudParis - Advisor

--- EOF ---

PhD defense: Alexis Colin – November, 28th – From trace collection to the prediction of the behaviour of parallel applications

Bonjour,J’ai le plaisir de vous inviter à ma soutenance de thèse intitulée “De la collecte de trace à la prédiction du comportement d’applications parallèles” [pdf]. Le résumé est disponible ci-dessous.

La soutenance aura lieu en français le lundi 28 novembre à 14h, dans l’amphithéâtre 3 des locaux de Télécom SudParis au 19 place Marguerite Perey, 91120 Palaiseau. Un accès en visioconférence sera disponible au lien suivant : https://webconf.imt.fr/frontend/fra-v2m-fsg-cuu.

Le jury sera composé de :
– Mme Amel Bouzeghoub, Professeure – Télécom SudParis (Examinatrice)
– M. Patrick Carribault, Chercheur – CEA/DAM (Examinateur)
– M. Denis Conan, Maître de conférences HDR – Télécom SudParis (Directeur de thèse)
– Mme Camille Coti, Professeure – Université du Québec à Montréal (Rapporteuse)
– M. Arnaud Legrand, Directeur de recherche – INRIA Grenoble (Examinateur)
– M. Samuel Thibault, Professeur – Université de Bordeaux (Rapporteur)
– M. François Trahay, Maître de conférences HDR – Télécom SudParis (Encadrant)

La soutenance sera suivie d’un pot.

Résumé :

Afin d’exploiter les ressources des serveurs et des supercalculateurs, les développeurs ont recours à des modèles de programmations spécifiques qui sont mis en œuvre par des runtimes dont le rôle est de permettre à chaque programme d’exploiter pleinement les capacités de la machine qui l’exécute. Pour cela, les runtimes doivent prendre des décisions qui ont un impact direct sur les performances. Pour prendre de bonnes décisions, les runtimes essaient d’anticiper le comportement futur des programmes, mais les moyens à leur disposition sont limités.

Nous présentons Pythia, un oracle générique permettant aux runtimes de prédire le comportement futur d’un programme. Nous décrivons comment enregistrer une trace d’exécution d’un programme pour en capturer la structure sous la forme d’une grammaire. Nous développons un algorithme performant capable de construire une telle grammaire à la volée pendant l’exécution d’un programme sans dégrader ses performances. Nous montrons ensuite comment utiliser une grammaire représentant la structure d’une exécution d’un programme pour prédire son comportement futur lors de ses exécutions ultérieures. Pythia permet en particulier d’explorer un arbre probabilisé des prochaines actions potentielles d’un programme.

L’évaluation de notre travail montre que les prédictions de Pythia peuvent être utilisées pour implémenter des optimisations au sein d’un runtime. Nous faisons aussi la démonstration de l’utilisabilité de Pythia en l’utilisant pour mettre en œuvre une stratégie de parallélisme adaptatif au sein d’un runtime OpenMP existant.

——————————————————-

[English]

Dear colleagues,

I have the pleasure to invite you to the defense of my PhD entitled “From trace collection to the prediction of the behaviour of parallel applications”. The abstract is below. The defense will take place in French on Monday, November 28 at 2:00 pm, in amphitheater 3 of the Télécom SudParis building at 19 place Marguerite Perey, 91120 Palaiseau. A videoconference access will be available at the following url: https://webconf.imt.fr/frontend/fra-v2m-fsg-cuu.

The jury will be composed of:
– Mrs. Amel Bouzeghoub, Professor – Télécom SudParis (Examiner)
– Mr. Patrick Carribault, Researcher – CEA/DAM (Examiner)
– Mr. Denis Conan, Associate Professor HDR – Télécom SudParis (Director)
– Mrs. Camille Coti, Professor – Université du Québec à Montréal (Reviewer)
– Mr. Arnaud Legrand, Research director – INRIA Grenoble (Examiner)
– Mr. Samuel Thibault, Professor – Université de Bordeaux (Reviewer)
– Mr. François Trahay, Associate Professor HDR – Télécom SudParis (Co-director)

The defense will be followed by a buffet.

Abstract:

In order to exploit the resources of servers and supercomputers, developers use specific programming models that are implemented by runtimes. Runtimes allow each program to fully exploit the capacities of the machine that executes it. To do this, runtimes take decisions that have a direct impact on the performance of the programs. In order to take good decisions, runtimes try to anticipate the future behavior of the programs, but the means at their disposal are limited.

We present Pythia, a generic oracle allowing runtimes to predict the future behavior of a program. We describe how to record an execution trace and to capture its structure in the form of a grammar. We develop an algorithm capable of building such a grammar on the fly during the execution of a program without degrading its performance. We then show how to use a grammar representing the structure of a program execution to predict its future behavior during its subsequent executions. In particular, Pythia allows to explore a probabilized tree of potential next actions of a program.

The evaluation of our work shows that the predictions of Pythia can be used to implement optimizations within a runtime. We have also demonstrated the usability of Pythia by using it to implement an adaptive parallelism strategy within an existing OpenMP runtime.

New paper “PYTHIA: an oracle to guide runtime system decisions” to be presented at Cluster’22

New paper “PYTHIA: an oracle to guide runtime system decisions” to be presented at Cluster’22.

Available online: https://hal.archives-ouvertes.fr/hal-03750441/document

Abstract

Runtime systems are commonly used by parallel applications in order to efficiently exploit the underlying hardware resources. A runtime system hides the complexity of the management of the hardware and exposes a high-level interface to application developers. To this end, it makes decisions by relying on heuristics that estimate the future behavior of the application. In this paper, we propose PYTHIA, a library that serves as an oracle capable of predicting the future behavior of an application, so that the runtime system can make more informed decisions. PYTHIA builds on the deterministic nature of many HPC applications: by  recording an execution trace, PYTHIA captures the application main behavior. The trace can be provided for future executions of the application, and a runtime system can ask for predictions of future program behavior. We evaluate PYTHIA on 13 MPI applications and show that PYTHIA can accurately predict the future of most of these applications, even when varying the problem size. We demonstrate how PYTHIA predictions can guide a runtime system optimization by implementing an adaptive thread parallelism strategy in GNU OpenMP runtime system. The evaluation shows that, thanks to PYTHIA prediction, the adaptive strategy reduces the execution time of an application by up to 38 %.

New paper “Why Globally Re-shuffle? Revisiting Data Shuffling in Large Scale Deep Learning” to be presented at IPDPS’22.

New paper “Why Globally Re-shuffle? Revisiting Data Shuffling in Large Scale Deep Learning” to be presented at IPDPS’22.

Available online: https://hal.archives-ouvertes.fr/hal-03599740/document

Abstract

Stochastic gradient descent (SGD) is the most prevalent algorithm for training Deep Neural Networks (DNN). SGD iterates the input data set in each training epoch processing data samples in a random access fashion. Because this puts enormous pressure on the I/O subsystem, the most common approach to distributed SGD in HPC environments is to replicate the entire dataset to node local SSDs. However, due to rapidly growing data set sizes this approach has become increasingly infeasible. Surprisingly, the questions of why and to what extent random access is required have not received a lot of attention in the literature from an empirical standpoint.

In this paper, we revisit data shuffling in DL workloads to investigate the viability of partitioning the dataset among workers and performing only a partial distributed exchange of samples in each training epoch. Through extensive experiments on up to 2,048 GPUs of ABCI and 4,096 compute nodes of Fugaku, we demonstrate that in practice validation accuracy of global shuffling can be maintained when carefully tuning the partial distributed exchange. We provide a solution implemented in PyTorch that enables users to control the proposed data exchange scheme.

New paper “J-NVM: Off-heap Persistent Objects in Java” to be presented at SOSP’21

New paper “J-NVM: Off-heap Persistent Objects in Java” to be presented at SOSP’21. Congrats to Anatole, Yohan, Kwabena, Pierre and Gaël!

New paper “Montsalvat: Intel SGX Shielding for GraalVM Native Images” to be presented at Middleware’21

New paper “Montsalvat: Intel SGX Shielding for GraalVM Native Images” to be presented at Middleware’21. Congrats to Gaël!

New paper “The Serverless Shell” to be presented at Middleware’21

New paper “The Serverless Shell” to be presented at Middleware’21. Congrats to Aurele and Pierre!

New paper “Highly-available and consistent group collaboration at the edge with Colony” to be presented at Middleware’21

New paper “Highly-available and consistent group collaboration at the edge with Colony” to be presented at Middleware’21. Congrats to Pierre!

New paper “Efficient Replication via Timestamp Stability” to be presented at Eurosys’21

New paper “Efficient Replication via Timestamp Stability” to be presented at Eurosys’21. Congrats to Pierre!

New paper “FaaSCache: an opportunistic free caching system for FaaS platforms” to be presented at Eurosys’21

New paper “FaaSCache: an opportunistic free caching system for FaaS platforms” to be presented at Eurosys’21. Congrats to Mathieu!

New paper “EZIOTracer: Unifying Kernel and User Space I/O Tracing for Data-Intensive Applications” to be presented at the CHEOPS workshop of Eurosys’21

New paper “EZIOTracer: Unifying Kernel and User Space I/O Tracing for Data-Intensive Applications” to be presented at the CHEOPS workshop of Eurosys’21. Congrats to Alexis C and François!

New paper “NVCache: A Plug-and-Play NVMM-based I/O booster for Legacy Systems” to be presented at DSN’21

New paper “NVCache: A Plug-and-Play NVMM-based I/O booster for Legacy Systems” to be presented at DSN’21. Congrats to Rémi and Gaël!

New paper “Transparent Overlapping of Blocking Communication in MPI Applications” to be presented at IEEE HPCC’20

New paper “Transparent Overlapping of Blocking Communication in MPI Applications” to be presented at IEEE HPCC’20. Congrat to Alexis, Elisabeth, François and Gaël!

New paper “Leaderless State-Machine Replication: Specification, Properties, Limits” to be presented at DISC’20

New paper “Leaderless State-Machine Replication: Specification, Properties, Limits” to be presented at Eurosys’20. Congrat to Pierre and Tuanir!

Muktikanta Sa joined the PDS group as postdoc

Muktikanta Sa joined the PDS group as postdoc, welcome!

Mathieu Bacou joined the PDS group as associate professor

Mathieu Bacou joined the PDS group as associate professor, welcome!

Aurèle Maheo joined the PDS group as postdoc

Aurèle Maheo joined the PDS group as postdoc, welcome!

New paper “State-Machine Replication for Planet-Scale Systems” to be presented at Eurosys’20

New paper “State-Machine Replication for Planet-Scale Systems” to be presented at Eurosys’20. Congrat to Pierre and Tuanir!

New paper “Using differential execution analysis to identify thread interference”. To appear in IEEE Transactions on Parallel and Distributed Systems

Abstract Understanding the performance of a multi-threaded application is difficult. The threads interfere when they access the same shared resource, which slows down their execution. Unfortunately, current profiling tools report the hardware components or the synchronization primitives that saturate, but they cannot tell if the saturation is the cause of a performance bottleneck. In this paper, we propose a holistic metric able to pinpoint the blocks of code that suffer interference the most, regardless of the interference cause. Our metric uses performance variation as a universal indicator of interference problems. With an evaluation of 27 applications we show that our metric can identify interference problems caused by 6 different kinds of interference in 9 applications. We are able to easily remove 7 of the bottlenecks, which leads to a performance improvement of up to 9 times

https://hal.archives-ouvertes.fr/hal-02179717v1