About Me
Quod obstat viae fit via.
Professional Experience
Research Staff Member
AI Platform - Performance and Scale
, IBM Research, 03/20-03/21
Lead the release of the cloud-native version of IBM Streams as an open-source project under the Apache v2.0 license. Now called OpenStreams, the sources can be found here.
Distributed Streams Computing
, IBM Research, 10/17-03/20
Designed and implemented TULIPS, a multi-threaded, ultra low-latency user-space TCP/IP stack for the IBM Streams product leveraging hardware offload accelerations found on the Mellanox ConnectX-5 device family. This new stack leads to 4x to 6x improved network performance depending on the application workload.
Designed and collaborated to the complete replacement of IBM Streams' distributed platform in favor of Kubernetes, resulting in the removal of 80% of the platform code base without any loss of functionality or guarantees such as fault-tolerance. This new design uses the Operator model to expose IBM Streams' core concepts as first-class citizens within Kubernetes. This work was published and its tooling released as an open-source library.
[ low-latency networking; userspace tcp/ip; kubernetes ]
Senior HW/SW Engineer
Core Performance Team
, Tower Research Capital, Inc, 10/15-10/17
Designed and implemented specialized networking hardware to minimize latency of various trading subsystems. Said solutions range from bump-in-the-wire data transcoders to more generic network controllers. Single-handedly authored every aspects of the solution: the hardware architecture, the software interface, and the Linux device driver.
Notably, designed and implemented market data transcoders for CME and LSE used as bump-in-the-wire on shared microwave lines were latency is particularly critical. Both transcoders achieved 200ns wire-to-wire latencies and were designed in record time using the HardCaml framework, effectively making them the first hardware trading systems designed using OCaml to be successfully used in a production environment.
[ verilog; modelsim; quartus; ocaml; c++; linux driver ]
Senior Software Developer
Core Performance Team
, Tower Research Capital, Inc, 01/14-10/15
Lead the specification and design of the company's tick-to-trade infrastructure worldwide. The role included researching and recommending the proper hardware setup, implementing and distributing the packet logger facility, the automated packet processing system producing daily time series of the trading activities, and latency computation tools to generate tick-to-trade data for each trading team as well as various other network-related metrics. Also designed and implemented a model-driven configuration framework that provides portability, extendability and validation to the company's tooling configuration.
[ c++; python; lisp; ibverbs; efvi ]
Research Staff Member
Workload Optimization Group
, IBM Research, 07/11-01/14
Explored the use of RDMA-capable, low-latency, high-performance network fabrics to enable next-generation NoSQL data stores and real-time BigData analytics. Revisited common data store issues such as persistence, high-availability, and distributed transactions using the shared memory paradigm.
[ c++; ibverbs; rdma ]
Post-Doctoral Researcher
Workload Optimization Group
, IBM Research, 10/10-07/11
Investigated the exploitation of low-level, hardware acceleration mechanisms inside complex, multi-layered application workloads such as business rules management systems and database clients. Investigated solutions to bring zero-copy memory transfers to high levels of commercial software stacks. Implemented a RDMA-based memory update engine over Infiniband and RoCE using uDAPL.
[ c++; simd; rdma ]
Research Associate
System-Level Synthesis Group
, TIMA Laboratory, 09/05-10/10
Designed and implemented DNA-OS, a generic, component-based operating system kernel which targets heterogeneous, multi-core system-on-chips. Using a micro-kernel design, it manages threads, messages, semaphores, dynamic memory, I/Os, drivers, and file system modules. Designed and implemented a component-based, automated environment to generate embedded applications. It includes the support of RISC, DSP, and micro-controller processors while successfully making use of the DNA kernel.
[ c; assembly; arm; mips; sparc; dsp; mpsoc ]
Education
Doctorate in Computer Science
System-Level Synthesis Group
, TIMA Laboratory, 09/06-10/10
This dissertation shows that complex, embedded software applications can effectively operate heterogeneous MP-SoC with respect to flexibility, scalability, portability, and Time-To- Market. It presents an improved embedded software design flow that combines an application code generator, GECKO, and a novel software framework, APES, to achieve a high level of efficiency. Our contribution is twofold: 1) an improved embedded software design flow with several tools that enable the automatic construction of minimal and optimized binaries for a given application targeting a given MP-SoC, and 2) a modular and portable set of software components that includes traditional operating system mechanisms as well as the support for multiple processors.
Open-Source Software
Name | Language | Role | Description |
---|---|---|---|
ace | C++ | Author | Versatile configuration authoring/parsing system |
bitstring | OCaml | Maintainer/Author | Bit stream manipulation library |
minima.l | C/Lisp | Author | Minimal Lisp interpreter |
openstreams | C++/Java/SPL | Maintainer/Author | Streams computing environment |
ppx_hardcaml | OCaml | Author | PPX extension of HardCaml |
ppx_deriving_hardcaml | OCaml | Author | PPX extension for HardCaml types |
tulips | C++ | Author | Ultra-low latency user-space TCP/IP stack |
Publications
2020
Scott Schneider, Xavier R. Guérin, Shaohan Hu, Kun-Lung Wu: A Cloud Native Platform for Stateful Streaming, ArXiv 2020
2015
Yandong Wang, Li Zhang, Jian Tan, Min Li, Yuqing Gao, Xavier Guérin, Xiaoqiao Meng, Shicong Meng: HydraDB: a resilient RDMA-driven key-value middleware for in-memory cluster computing, SC 2015: 22:1-22:11
2013
Liana L. Fong, Yuqing Gao, Xavier Guérin, Yonggang Liu, T. Salo, Seetharami Seelam, Wei Tan, Sandeep Tata: Toward a scale-out data-management middleware for low-latency enterprise computing, IBM Journal of Research and Development 57(3/4): 6 (2013)
2012
Xavier Guérin, Wei Tan, Yanbin Liu, Seetharami Seelam, Parijat Dube: Evaluation of Multi-core Scalability Bottlenecks in Enterprise Java Workloads, MASCOTS 2012: 308-317
2011
Xavier Guérin, Yanbin Liu, Parijat Dube, Seetharami Seelam, Pierre-Andre Paumelle: Scalability analysis of enterprise Java workloads on a multi-core system, IISWC 2011: 77
2009
Sang-Il Han, Soo-Ik Chae, Lisane B. de Brisolara, Luigi Carro, Katalin Popovici, Xavier Guérin, Ahmed Amine Jerraya, Kai Huang, Lei Li, Xiaolang Yan: Simulink®-based heterogeneous multiprocessor SoC design flow for mixed hardware/software refinement and simulation, Integration 42(2): 227-245 (2009)
Xavier Guérin, Frédéric Pétrot: A System Framework for the Design of Embedded Software Targeting Heterogeneous Multi-core SoCs, ASAP 2009: 153-160 (best paper award)
Alexandre Chagoya-Garzon, Xavier Guérin, Frédéric Rousseau, Frédéric Pétrot, Davide Rossetti, Alessandro Lonardo, Piero Vicini, Pier Stanislao Paolucci: Synthesis of Communication Mechanisms for Multi-tile Systems Based on Heterogeneous Multi-processor System-On-Chips, IEEE International Workshop on Rapid System Prototyping 2009: 48-54
2008
Katalin Popovici, Xavier Guérin, Frédéric Rousseau, Pier Stanislao Paolucci, Ahmed Amine Jerraya: Platform-based software design flow for heterogeneous MPSoC, ACM Trans. Embedded Comput. Syst. 7(4) (2008)
Patrice Gerin, Xavier Guérin, Frédéric Pétrot: Efficient Implementation of Native Software Simulation for MPSoC, DATE 2008: 676-681
2007
Sang-Il Han, Soo-Ik Chae, Lisane B. de Brisolara, Luigi Carro, Ricardo Reis, Xavier Guérin, Ahmed Amine Jerraya: Memory-efficient multithreaded code generation from Simulink for heterogeneous MPSoC, Design Autom. for Emb. Sys. 11(4): 249-283 (2007)
Xavier Guérin, Katalin Popovici, Wassim Youssef, Frédéric Rousseau, Ahmed Amine Jerraya: Flexible Application Software Generation for Heterogeneous Multi-Processor System-on-Chip, COMPSAC (1) 2007: 279-286
Kai Huang, Sang-Il Han, Katalin Popovici, Lisane B. de Brisolara, Xavier Guérin, Lei Li, Xiaolang Yan, Soo-Ik Chae, Luigi Carro, Ahmed Amine Jerraya: Simulink-Based MPSoC Design Flow: Case Study of Motion-JPEG and H.264, DAC 2007: 39-42
Katalin Popovici, Xavier Guérin, Frédéric Rousseau, Pier Stanislao Paolucci, Ahmed Amine Jerraya: Efficient Software Development Platforms for Multimedia Applications at Different Abstraction Levels, IEEE International Workshop on Rapid System Prototyping 2007: 113-122
Lisane B. de Brisolara, Sang-Il Han, Xavier Guérin, Luigi Carro, Ricardo Reis, Soo-Ik Chae, Ahmed Amine Jerraya: Reducing fine-grain communication overhead in multithread code generation for heterogeneous MPSoC, SCOPES 2007: 81-89
2006
Sang-Il Han, Xavier Guérin, Soo-Ik Chae, Ahmed Amine Jerraya: Buffer memory optimization for video codec application modeled in Simulink, DAC 2006: 689-694
Patents
2020
Xavier R. Guérin: Speculative execution in a distributed streaming system, United States US10657091B2
2019
Xavier R. Guérin, Scott Schneider, Xiang Ni: Adaptive locking in elastic threading systems, United States US20190377582A1
Xavier R. Guérin, Shicong Meng: Passive two-phase commit system for high-performance distributed transaction execution, United States US10296371B2
2018
Xavier R. Guérin, Shicong Meng: Multi-way, zero-copy, passive transaction log collection in distributed transaction systems, United States US20150347243A1
2016
Yuqing Gao, Xavier R. Guérin, Xiaoqiao Meng, Tiia Salo: Remote direct memory access (RDMA) high performance producer-consumer message processing, United States 9,495,325
Xavier R. Guérin, Tiia J. Salo: RDMA-optimized high-performance distributed cache, United States 9,378,179
Xavier R. Guérin: Keyboard with macro keys made up of positionally adjustable micro keys, United States 9,335,830
Yuqing Gao, Xavier R. Guérin, Graeme Johnson: High performance, distributed, shared, data grid for distributed Java virtual machine runtime artifacts, United States 9,332,083
Xavier R. Guérin, Yinglong Xia: Scheduling and execution of DAG-structured computation on RDMA-connected clusters, United States 9,300,749
2015
Xavier R. Guérin: Load balancing of distributed services United States 9,667,711
Parijat Dube, Xavier R. Guérin, Seetharami R. Seelam: Method, apparatus and computer programs providing cluster-wide page management, United States 9,170,950
Xavier R. Guérin, Xiaoqiao Meng, David P. Olshefski, John M. Tracey: Automatic pinning and unpinning of virtual pages for remote direct memory access, United States 9,037,753
Megumi Ito, Michael Dawson, Xavier R. Guérin, Seetharami Seelam: Java native interface array handling in a distributed java virtual machine, United States 8,990,790
2014
Michael H. Dawson, Parijat Dube, Liana L. Fong, Yuqing Gao, Xavier R. Guérin, Michel H. T. Hack, Megumi Ito, Graeme Johnson, Nai K. Ling, Yanbin Liu, Xiaoqiao Meng, Pramod B. Nagaraja, Seetharami R. Seelam, Wei Tan, Li Zhang: Preferential execution of method calls in hybrid systems, United States 8,843,894