Tal Remez, Ph.D.

talremez at gmail dot com
Tal Remez

I am an AI researcher with a PhD in machine learning, specializing in the development of large-scale foundation models and innovative architectures like world models. My expertise spans multi-modal large language models (LLMs) encompassing video, audio, and textual data, as well as core areas like optimization, visual perception, and computational photography.

My latest work at Amazon and FAIR has centered on pushing the boundaries of these models, including leading a project on large-scale multi-modal foundation models for sport understanding and developing a "Code World Model." This work includes high-impact applications of multi-modal LLMs in audio/music generation, text/code generation, and advanced research into techniques like flow matching and discrete flow matching for tasks such as text continuation and chain-of-thought reasoning.

Publications

CWM: An Open-Weights LLM for Research on Code Generation with World Models
Improved LLM Code-Generation
COLM 2024
The Larger the Better? Improved LLM Code-Generation via Budget Reallocation
Michael Hassid*, Tal Remez*, Jonas Gehring, Roy Schwartz, Yossi Adi
Simple and controllable music generation
NeurIPS 2024
Simple and controllable music generation
Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, Alexandre Défossez
Textually pretrained speech language models
NeurIPS 2024
Textually pretrained speech language models
Michael Hassid*, Tal Remez*, Tu Anh Nguyen, Itai Gat, Alexis Conneau, Felix Kreuk, Jade Copet, Alexandre Defossez, Gabriel Synnaeve, Emmanuel Dupoux, Roy Schwartz, Yossi Adi
Revise
In-the-Wild Visually-Driven Prosody
CVPR 2022
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
Michael Hassid, Michelle Tadmor Ramanovich, Brendan Shillingford, Miaosen Wang, Ye Jia, Tal Remez
Improving On-Screen Sound Separation
CVPR 2022
Improving On-Screen Sound Separation for Open Domain Videos with Audio-Visual Self-attention
Efthymios Tzinis, Scott Wisdom, Tal Remez, John R Hershey
Translatotron 2
ICML 2022
Translatotron 2: Robust direct speech-to-speech translation
Ye Jia, Michelle Tadmor Ramanovich, Tal Remez, Roi Pomerantz
AudioScope
ICLR 2021
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds
Efthymios Tzinis, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Daniel P. W. Ellis, John R. Hershey
Learning to Segment
ECCV 2018
Learning to Segment via Cut-and-Paste
Tal Remez, Jonathan Huang, Matthew Brown
Class-Aware Denoising
TIP 2018
Class-Aware Fully-Convolutional Gaussian and Poisson Denoising
Tal Remez, Or Litany, Raja Giryes, Alex M. Bronstein
Deep Functional Maps
ICCV 2017
Deep Functional Maps: Structured Prediction for Dense Shape Correspondence
Or Litany, Tal Remez, Emanuele RodolĂ , Alex M. Bronstein, Michael M. Bronstein
Deep Class Aware Image Denoising
ICIP 2017
Deep Class Aware Image Denoising
Tal Remez, Or Litany, Raja Giryes and Alex M. Bronstein
Low-Light Denoising
2017
Deep Convolutional Denoising of Low-Light Images
Tal Remez, Or Litany, Raja Giryes and Alex M. Bronstein
Cloud Dictionary
SPARS 2017
Cloud Dictionary: Sparse Coding and Modeling for Point Clouds
Or Litany*, Tal Remez*, Alex Bronstein
ASIST
CVIU 2017
ASIST: Automatic Semantically Invariant Scene Transformation
Or Litany, Tal Remez, Daniel Freedman, Lior Shapira, Alex Bronstein, Ran Gal
A picture is worth a billion bits
Image reconstruction from dense binary pixels
SPARS 2015
Image reconstruction from dense binary pixels
Or Litany*, Tal Remez*, Alex Bronstein