Tal Remez Ph.D

talremez at gmail dot com

tal I am an AI and machine learning researcher with a PhD and experience across a variety of fields, including large language models (LLMs), optimization, visual perception, computational photography, and audio-visual methods for speech enhancement. My work has focused on applications of LLMs in audio and music generation, as well as text and code generation. Recently, I’ve explored techniques like flow matching and diffusion in latent text embeddings for tasks like chain-of-thought reasoning. With a background at Facebook AI Research (FAIR) and Google Research, I’ve led projects that pushed the boundaries of AI, including on-device audio-visual speech separation and advancements in LLMs. I’m now looking for my next challenge where I can apply my skills and collaborate with a talented team to drive meaningful outcomes.





Publications




dfm

Discrete flow matching

Itai Gat, Tal Remez, Neta Shaul, Felix Kreuk, Ricky TQ Chen, Gabriel Synnaeve, Yossi Adi, Yaron Lipman

NeurIPS 2024




larger

The Larger the Better? Improved LLM Code-Generation via Budget Reallocation

Michael Hassid*, Tal Remez*, Jonas Gehring, Roy Schwartz, Yossi Adi

COLM 2024




musicgen

Simple and controllable music generation

Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, Alexandre Défossez

NeurIPS 2024




twist

Textually pretrained speech language models

Michael Hassid*, Tal Remez*, Tu Anh Nguyen, Itai Gat, Alexis Conneau, Felix Kreuk, Jade Copet, Alexandre Defossez, Gabriel Synnaeve, Emmanuel Dupoux, Roy Schwartz, Yossi Adi

NeurIPS 2024




Revise

Revise: Self-supervised speech resynthesis with visual input for universal and generalized speech regeneration

Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi

CVPR 2023




visually driven tts

More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech

Michael Hassid, Michelle Tadmor Ramanovich, Brendan Shillingford, Miaosen Wang, Ye Jia, Tal Remez

CVPR 2022




Improving On-Screen Sound Separation for Open Domain Videos with Audio-Visual Self-attention

Improving On-Screen Sound Separation for Open Domain Videos with Audio-Visual Self-attention

Efthymios Tzinis, Scott Wisdom, Tal Remez, John R Hershey

CVPR 2022




Translatotron 2: Robust direct speech-to-speech translation

Translatotron 2: Robust direct speech-to-speech translation

Ye Jia, Michelle Tadmor Ramanovich, Tal Remez, Roi Pomerantz

ICML 2022

Website




Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds

Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds

Efthymios Tzinis, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Daniel P. W. Ellis, John R. Hershey

ICLR 2021




Learning to Segment via Cut-and-Paste

Learning to Segment via Cut-and-Paste

Tal Remez, Jonathan Huang, Matthew Brown

ECCV 2018




Class-Aware Fully-Convolutional Gaussian and Poisson Denoising

Class-Aware Fully-Convolutional Gaussian and Poisson Denoising

Tal Remez, Or Litany, Raja Giryes, Alex M. Bronstein

TIP 2018




Deep Functional Maps: Structured Prediction for Dense Shape Correspondence

Deep Functional Maps: Structured Prediction for Dense Shape Correspondence

Or Litany, Tal Remez, Emanuele Rodolà, Alex M. Bronstein, Michael M. Bronstein

ICCV 2017




Deep Class Aware Image Denoising

Deep Class Aware Image Denoising

Tal Remez, Or Litany, Raja Giryes and Alex M. Bronstein

ICIP 2017

GitHub





Deep Convolutional Denoising of Low-Light Images

Deep Convolutional Denoising of Low-Light Images

Tal Remez, Or Litany, Raja Giryes and Alex M. Bronstein




Cloud Dictionary: Sparse Coding and Modeling for Point Clouds

Cloud Dictionary: Sparse Coding and Modeling for Point Clouds

Or Litany*, Tal Remez*, Alex Bronstein

SPARS 2017




ASIST: Automatic Semantically Invariant Scene Transformation

ASIST: Automatic Semantically Invariant Scene Transformation

Or Litany, Tal Remez, Daniel Freedman, Lior Shapira, Alex Bronstein, Ran Gal

CVIU 2017




A picture is worth a billion bits: Real-time image reconstruction from dense binary threshold pixels

A picture is worth a billion bits: Real-time image reconstruction from dense binary threshold pixels

Tal Remez,Or Litany, Alex Bronstein

ICCP 2016 Oral




Image reconstruction from dense binary pixels

Image reconstruction from dense binary pixels

Or Litany*, Tal Remez*, Alex Bronstein

SPARS 2015