About Me

Quod obstat viae fit via.

Professional Experience

Research Staff Member

AI Platform - Performance and Scale, IBM Research, 03/20-03/21

Lead the release of the cloud-native version of IBM Streams as an open-source project under the Apache v2.0 license. Now called OpenStreams, the sources can be found here.

Distributed Streams Computing, IBM Research, 10/17-03/20

Designed and implemented TULIPS, a multi-threaded, ultra low-latency user-space TCP/IP stack for the IBM Streams product leveraging hardware offload accelerations found on the Mellanox ConnectX-5 device family. This new stack leads to 4x to 6x improved network performance depending on the application workload.

Designed and collaborated to the complete replacement of IBM Streams' distributed platform in favor of Kubernetes, resulting in the removal of 80% of the platform code base without any loss of functionality or guarantees such as fault-tolerance. This new design uses the Operator model to expose IBM Streams' core concepts as first-class citizens within Kubernetes. This work was published and its tooling released as an open-source library.

[ low-latency networking; userspace tcp/ip; kubernetes ]

Senior HW/SW Engineer

Core Performance Team, Tower Research Capital, Inc, 10/15-10/17

Designed and implemented specialized networking hardware to minimize latency of various trading subsystems. Said solutions range from bump-in-the-wire data transcoders to more generic network controllers. Single-handedly authored every aspects of the solution: the hardware architecture, the software interface, and the Linux device driver.

Notably, designed and implemented market data transcoders for CME and LSE used as bump-in-the-wire on shared microwave lines were latency is particularly critical. Both transcoders achieved 200ns wire-to-wire latencies and were designed in record time using the HardCaml framework, effectively making them the first hardware trading systems designed using OCaml to be successfully used in a production environment.

[ verilog; modelsim; quartus; ocaml; c++; linux driver ]

Senior Software Developer

Core Performance Team, Tower Research Capital, Inc, 01/14-10/15

Lead the specification and design of the company's tick-to-trade infrastructure worldwide. The role included researching and recommending the proper hardware setup, implementing and distributing the packet logger facility, the automated packet processing system producing daily time series of the trading activities, and latency computation tools to generate tick-to-trade data for each trading team as well as various other network-related metrics. Also designed and implemented a model-driven configuration framework that provides portability, extendability and validation to the company's tooling configuration.

[ c++; python; lisp; ibverbs; efvi ]

Research Staff Member

Workload Optimization Group, IBM Research, 07/11-01/14

Explored the use of RDMA-capable, low-latency, high-performance network fabrics to enable next-generation NoSQL data stores and real-time BigData analytics. Revisited common data store issues such as persistence, high-availability, and distributed transactions using the shared memory paradigm.

[ c++; ibverbs; rdma ]

Post-Doctoral Researcher

Workload Optimization Group, IBM Research, 10/10-07/11

Investigated the exploitation of low-level, hardware acceleration mechanisms inside complex, multi-layered application workloads such as business rules management systems and database clients. Investigated solutions to bring zero-copy memory transfers to high levels of commercial software stacks. Implemented a RDMA-based memory update engine over Infiniband and RoCE using uDAPL.

[ c++; simd; rdma ]

Research Associate

System-Level Synthesis Group, TIMA Laboratory, 09/05-10/10

Designed and implemented DNA-OS, a generic, component-based operating system kernel which targets heterogeneous, multi-core system-on-chips. Using a micro-kernel design, it manages threads, messages, semaphores, dynamic memory, I/Os, drivers, and file system modules. Designed and implemented a component-based, automated environment to generate embedded applications. It includes the support of RISC, DSP, and micro-controller processors while successfully making use of the DNA kernel.

[ c; assembly; arm; mips; sparc; dsp; mpsoc ]

Education

Doctorate in Computer Science

System-Level Synthesis Group, TIMA Laboratory, 09/06-10/10

This dissertation shows that complex, embedded software applications can effectively operate heterogeneous MP-SoC with respect to flexibility, scalability, portability, and Time-To- Market. It presents an improved embedded software design flow that combines an application code generator, GECKO, and a novel software framework, APES, to achieve a high level of efficiency. Our contribution is twofold: 1) an improved embedded software design flow with several tools that enable the automatic construction of minimal and optimized binaries for a given application targeting a given MP-SoC, and 2) a modular and portable set of software components that includes traditional operating system mechanisms as well as the support for multiple processors.

Open-Source Software

Name Language Role Description
ace C++ Author Versatile configuration authoring/parsing system
bitstring OCaml Maintainer/Author Bit stream manipulation library
minima.l C/Lisp Author Minimal Lisp interpreter
openstreams C++/Java/SPL Maintainer/Author Streams computing environment
ppx_hardcaml OCaml Author PPX extension of HardCaml
ppx_deriving_hardcaml OCaml Author PPX extension for HardCaml types
tulips C++ Author Ultra-low latency user-space TCP/IP stack

Publications

2020

Scott Schneider, Xavier R. Guérin, Shaohan Hu, Kun-Lung Wu: A Cloud Native Platform for Stateful Streaming, ArXiv 2020

2015

Yandong Wang, Li Zhang, Jian Tan, Min Li, Yuqing Gao, Xavier Guérin, Xiaoqiao Meng, Shicong Meng: HydraDB: a resilient RDMA-driven key-value middleware for in-memory cluster computing, SC 2015: 22:1-22:11

2013

Liana L. Fong, Yuqing Gao, Xavier Guérin, Yonggang Liu, T. Salo, Seetharami Seelam, Wei Tan, Sandeep Tata: Toward a scale-out data-management middleware for low-latency enterprise computing, IBM Journal of Research and Development 57(3/4): 6 (2013)

2012

Xavier Guérin, Wei Tan, Yanbin Liu, Seetharami Seelam, Parijat Dube: Evaluation of Multi-core Scalability Bottlenecks in Enterprise Java Workloads, MASCOTS 2012: 308-317

2011

Xavier Guérin, Yanbin Liu, Parijat Dube, Seetharami Seelam, Pierre-Andre Paumelle: Scalability analysis of enterprise Java workloads on a multi-core system, IISWC 2011: 77

2009

Sang-Il Han, Soo-Ik Chae, Lisane B. de Brisolara, Luigi Carro, Katalin Popovici, Xavier Guérin, Ahmed Amine Jerraya, Kai Huang, Lei Li, Xiaolang Yan: Simulink®-based heterogeneous multiprocessor SoC design flow for mixed hardware/software refinement and simulation, Integration 42(2): 227-245 (2009)

Xavier Guérin, Frédéric Pétrot: A System Framework for the Design of Embedded Software Targeting Heterogeneous Multi-core SoCs, ASAP 2009: 153-160 (best paper award)

Alexandre Chagoya-Garzon, Xavier Guérin, Frédéric Rousseau, Frédéric Pétrot, Davide Rossetti, Alessandro Lonardo, Piero Vicini, Pier Stanislao Paolucci: Synthesis of Communication Mechanisms for Multi-tile Systems Based on Heterogeneous Multi-processor System-On-Chips, IEEE International Workshop on Rapid System Prototyping 2009: 48-54

2008

Katalin Popovici, Xavier Guérin, Frédéric Rousseau, Pier Stanislao Paolucci, Ahmed Amine Jerraya: Platform-based software design flow for heterogeneous MPSoC, ACM Trans. Embedded Comput. Syst. 7(4) (2008)

Patrice Gerin, Xavier Guérin, Frédéric Pétrot: Efficient Implementation of Native Software Simulation for MPSoC, DATE 2008: 676-681

2007

Sang-Il Han, Soo-Ik Chae, Lisane B. de Brisolara, Luigi Carro, Ricardo Reis, Xavier Guérin, Ahmed Amine Jerraya: Memory-efficient multithreaded code generation from Simulink for heterogeneous MPSoC, Design Autom. for Emb. Sys. 11(4): 249-283 (2007)

Xavier Guérin, Katalin Popovici, Wassim Youssef, Frédéric Rousseau, Ahmed Amine Jerraya: Flexible Application Software Generation for Heterogeneous Multi-Processor System-on-Chip, COMPSAC (1) 2007: 279-286

Kai Huang, Sang-Il Han, Katalin Popovici, Lisane B. de Brisolara, Xavier Guérin, Lei Li, Xiaolang Yan, Soo-Ik Chae, Luigi Carro, Ahmed Amine Jerraya: Simulink-Based MPSoC Design Flow: Case Study of Motion-JPEG and H.264, DAC 2007: 39-42

Katalin Popovici, Xavier Guérin, Frédéric Rousseau, Pier Stanislao Paolucci, Ahmed Amine Jerraya: Efficient Software Development Platforms for Multimedia Applications at Different Abstraction Levels, IEEE International Workshop on Rapid System Prototyping 2007: 113-122

Lisane B. de Brisolara, Sang-Il Han, Xavier Guérin, Luigi Carro, Ricardo Reis, Soo-Ik Chae, Ahmed Amine Jerraya: Reducing fine-grain communication overhead in multithread code generation for heterogeneous MPSoC, SCOPES 2007: 81-89

2006

Sang-Il Han, Xavier Guérin, Soo-Ik Chae, Ahmed Amine Jerraya: Buffer memory optimization for video codec application modeled in Simulink, DAC 2006: 689-694

Patents

2020

Xavier R. Guérin: Speculative execution in a distributed streaming system, United States US10657091B2

2019

Xavier R. Guérin, Scott Schneider, Xiang Ni: Adaptive locking in elastic threading systems, United States US20190377582A1

Xavier R. Guérin, Shicong Meng: Passive two-phase commit system for high-performance distributed transaction execution, United States US10296371B2

2018

Xavier R. Guérin, Shicong Meng: Multi-way, zero-copy, passive transaction log collection in distributed transaction systems, United States US20150347243A1

2016

Yuqing Gao, Xavier R. Guérin, Xiaoqiao Meng, Tiia Salo: Remote direct memory access (RDMA) high performance producer-consumer message processing, United States 9,495,325

Xavier R. Guérin, Tiia J. Salo: RDMA-optimized high-performance distributed cache, United States 9,378,179

Xavier R. Guérin: Keyboard with macro keys made up of positionally adjustable micro keys, United States 9,335,830

Yuqing Gao, Xavier R. Guérin, Graeme Johnson: High performance, distributed, shared, data grid for distributed Java virtual machine runtime artifacts, United States 9,332,083

Xavier R. Guérin, Yinglong Xia: Scheduling and execution of DAG-structured computation on RDMA-connected clusters, United States 9,300,749

2015

Xavier R. Guérin: Load balancing of distributed services United States 9,667,711

Parijat Dube, Xavier R. Guérin, Seetharami R. Seelam: Method, apparatus and computer programs providing cluster-wide page management, United States 9,170,950

Xavier R. Guérin, Xiaoqiao Meng, David P. Olshefski, John M. Tracey: Automatic pinning and unpinning of virtual pages for remote direct memory access, United States 9,037,753

Megumi Ito, Michael Dawson, Xavier R. Guérin, Seetharami Seelam: Java native interface array handling in a distributed java virtual machine, United States 8,990,790

2014

Michael H. Dawson, Parijat Dube, Liana L. Fong, Yuqing Gao, Xavier R. Guérin, Michel H. T. Hack, Megumi Ito, Graeme Johnson, Nai K. Ling, Yanbin Liu, Xiaoqiao Meng, Pramod B. Nagaraja, Seetharami R. Seelam, Wei Tan, Li Zhang: Preferential execution of method calls in hybrid systems, United States 8,843,894