Search Results for author: Leonid Karlinsky

Found 58 papers, 33 papers with code

Using body-anchored priors for identifying actions in single images

no code implementations • NeurIPS 2010 • Leonid Karlinsky, Michael Dinerstein, Shimon Ullman

The task is easy for humans but difficult for current approaches to object recognition, because action instances may be similar in terms of body pose, and often require detailed examination of relations between participating objects and body parts in order to be recognized.

Object Recognition

Paper
Add Code

Fine-Grained Recognition of Thousands of Object Categories With Single-Example Training

1 code implementation • CVPR 2017 • Leonid Karlinsky, Joseph Shtok, Yochay Tzur, Asaf Tzadok

We approach the problem of fast detection and recognition of a large number (thousands) of object categories while training on a very limited amount of examples, usually one per category.

Paper
Code

RepMet: Representative-based metric learning for classification and one-shot object detection

1 code implementation • 12 Jun 2018 • Leonid Karlinsky, Joseph Shtok, Sivan Harary, Eli Schwartz, Amit Aides, Rogerio Feris, Raja Giryes, Alex M. Bronstein

Distance metric learning (DML) has been successfully applied to object classification, both in the standard regime of rich training data and in the few-shot scenario, where each category is represented by only a few examples.

Classification Few-Shot Object Detection +5

106

Paper
Code

Delta-encoder: an effective sample synthesis method for few-shot object recognition

1 code implementation • NeurIPS 2018 • Eli Schwartz, Leonid Karlinsky, Joseph Shtok, Sivan Harary, Mattias Marder, Rogerio Feris, Abhishek Kumar, Raja Giryes, Alex M. Bronstein

Our approach is based on a modified auto-encoder, denoted Delta-encoder, that learns to synthesize new samples for an unseen category just by seeing few examples from it.

Ranked #2 on Few-Shot Image Classification on CIFAR100 5-way (1-shot)

Few-Shot Image Classification Few-Shot Learning +1

Paper
Code

Co-regularized Alignment for Unsupervised Domain Adaptation

no code implementations • NeurIPS 2018 • Abhishek Kumar, Prasanna Sattigeri, Kahini Wadhawan, Leonid Karlinsky, Rogerio Feris, William T. Freeman, Gregory Wornell

Deep neural networks, trained with large amount of labeled data, can fail to generalize well when tested with examples from a \emph{target domain} whose distribution differs from the training data distribution, referred as the \emph{source domain}.

Unsupervised Domain Adaptation

Paper
Add Code

LaSO: Label-Set Operations networks for multi-label few-shot learning

2 code implementations • CVPR 2019 • Amit Alfassy, Leonid Karlinsky, Amit Aides, Joseph Shtok, Sivan Harary, Rogerio Feris, Raja Giryes, Alex M. Bronstein

We conduct numerous experiments showing promising results for the label-set manipulation capabilities of the proposed approach, both directly (using the classification and retrieval metrics), and in the context of performing data augmentation for multi-label few-shot learning.

Data Augmentation Few-Shot Learning +2

Paper
Code

RepMet: Representative-Based Metric Learning for Classification and Few-Shot Object Detection

1 code implementation • CVPR 2019 • Leonid Karlinsky, Joseph Shtok, Sivan Harary, Eli Schwartz, Amit Aides, Rogerio Feris, Raja Giryes, Alex M. Bronstein

Classification Few-Shot Object Detection +4

106

Paper
Code

Baby steps towards few-shot learning with multiple semantics

no code implementations • 5 Jun 2019 • Eli Schwartz, Leonid Karlinsky, Rogerio Feris, Raja Giryes, Alex M. Bronstein

Learning from one or few visual examples is one of the key capabilities of humans since early infancy, but is still a significant challenge for modern AI systems.

Ranked #9 on Few-Shot Image Classification on Mini-ImageNet - 1-Shot Learning (using extra training data)

Few-Shot Image Classification Few-Shot Learning

Paper
Add Code

MetAdapt: Meta-Learned Task-Adaptive Architecture for Few-Shot Classification

no code implementations • 1 Dec 2019 • Sivan Doveh, Eli Schwartz, Chao Xue, Rogerio Feris, Alex Bronstein, Raja Giryes, Leonid Karlinsky

In this work, we propose to employ tools inspired by the Differentiable Neural Architecture Search (D-NAS) literature in order to optimize the architecture for FSL without over-fitting.

Classification Few-Shot Learning +2

Paper
Add Code

A Broader Study of Cross-Domain Few-Shot Learning

2 code implementations • ECCV 2020 • Yunhui Guo, Noel C. Codella, Leonid Karlinsky, James V. Codella, John R. Smith, Kate Saenko, Tajana Rosing, Rogerio Feris

Extensive experiments on the proposed benchmark are performed to evaluate state-of-art meta-learning approaches, transfer learning approaches, and newer methods for cross-domain few-shot learning.

Ranked #3 on Cross-Domain Few-Shot on Plantae

cross-domain few-shot learning Few-Shot Image Classification +1

217

Paper
Code

TAFSSL: Task-Adaptive Feature Sub-Space Learning for few-shot classification

1 code implementation • ECCV 2020 • Moshe Lichtenstein, Prasanna Sattigeri, Rogerio Feris, Raja Giryes, Leonid Karlinsky

The field of Few-Shot Learning (FSL), or learning from very few (typically $1$ or $5$) examples per novel class (unseen during training), has received a lot of attention and significant performance advances in the recent literature.

Few-Shot Learning General Classification

Paper
Code

StarNet: towards Weakly Supervised Few-Shot Object Detection

1 code implementation • 15 Mar 2020 • Leonid Karlinsky, Joseph Shtok, Amit Alfassy, Moshe Lichtenstein, Sivan Harary, Eli Schwartz, Sivan Doveh, Prasanna Sattigeri, Rogerio Feris, Alexander Bronstein, Raja Giryes

Few-shot detection and classification have advanced significantly in recent years.

Classification Few-Shot Learning +5

Paper
Code

OnlineAugment: Online Data Augmentation with Less Domain Knowledge

1 code implementation • ECCV 2020 • Zhiqiang Tang, Yunhe Gao, Leonid Karlinsky, Prasanna Sattigeri, Rogerio Feris, Dimitris Metaxas

First is that most if not all modern augmentation search methods are offline and learning policies are isolated from their usage.

Data Augmentation Image Classification

Paper
Code

AR-Net: Adaptive Frame Resolution for Efficient Action Recognition

1 code implementation • ECCV 2020 • Yue Meng, Chung-Ching Lin, Rameswar Panda, Prasanna Sattigeri, Leonid Karlinsky, Aude Oliva, Kate Saenko, Rogerio Feris

Specifically, given a video frame, a policy network is used to decide what input resolution should be used for processing by the action recognition model, with the goal of improving both accuracy and efficiency.

Action Recognition

Paper
Code

Fine-grained Angular Contrastive Learning with Coarse Labels

1 code implementation • CVPR 2021 • Guy Bukchin, Eli Schwartz, Kate Saenko, Ori Shahar, Rogerio Feris, Raja Giryes, Leonid Karlinsky

A very practical example of C2FS is when the target classes are sub-classes of the training classes.

Contrastive Learning Few-Shot Learning +1

Paper
Code

A Maximal Correlation Approach to Imposing Fairness in Machine Learning

no code implementations • 30 Dec 2020 • Joshua Lee, Yuheng Bu, Prasanna Sattigeri, Rameswar Panda, Gregory Wornell, Leonid Karlinsky, Rogerio Feris

As machine learning algorithms grow in popularity and diversify to many industries, ethical and legal concerns regarding their fairness have become increasingly relevant.

BIG-bench Machine Learning Fairness

Paper
Add Code

AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition

no code implementations • ICLR 2021 • Yue Meng, Rameswar Panda, Chung-Ching Lin, Prasanna Sattigeri, Leonid Karlinsky, Kate Saenko, Aude Oliva, Rogerio Feris

Temporal modelling is the key for efficient video action recognition.

Action Recognition Temporal Action Localization

Paper
Add Code

Self-Supervised Classification Network

2 code implementations • 19 Mar 2021 • Elad Amrani, Leonid Karlinsky, Alex Bronstein

To guarantee non-degenerate solutions (i. e., solutions where all labels are assigned to the same class) we propose a mathematically motivated variant of the cross-entropy loss that has a uniform prior asserted on the predicted labels.

Ranked #3 on Unsupervised Image Classification on ImageNet

Classification Clustering +5

Paper
Code

A Broad Study on the Transferability of Visual Representations with Contrastive Learning

2 code implementations • ICCV 2021 • Ashraful Islam, Chun-Fu Chen, Rameswar Panda, Leonid Karlinsky, Richard Radke, Rogerio Feris

Tremendous progress has been made in visual representation learning, notably with the recent success of self-supervised contrastive learning methods.

Contrastive Learning object-detection +2

Paper
Code

Detector-Free Weakly Supervised Grounding by Separation

1 code implementation • ICCV 2021 • Assaf Arbelle, Sivan Doveh, Amit Alfassy, Joseph Shtok, Guy Lev, Eli Schwartz, Hilde Kuehne, Hila Barak Levi, Prasanna Sattigeri, Rameswar Panda, Chun-Fu Chen, Alex Bronstein, Kate Saenko, Shimon Ullman, Raja Giryes, Rogerio Feris, Leonid Karlinsky

In this work, we focus on the task of Detector-Free WSG (DF-WSG) to solve WSG without relying on a pre-trained detector.

Ranked #1 on Phrase Grounding on Visual Genome

Phrase Grounding

Paper
Code

Dynamic Distillation Network for Cross-Domain Few-Shot Recognition with Unlabeled Data

1 code implementation • NeurIPS 2021 • Ashraful Islam, Chun-Fu Chen, Rameswar Panda, Leonid Karlinsky, Rogerio Feris, Richard J. Radke

As the base dataset and unlabeled dataset are from different domains, projecting the target images in the class-domain of the base dataset with a fixed pretrained model might be sub-optimal.

cross-domain few-shot learning

Paper
Code

CHARTER: heatmap-based multi-type chart data extraction

no code implementations • 28 Nov 2021 • Joseph Shtok, Sivan Harary, Ophir Azulai, Adi Raz Goldfarb, Assaf Arbelle, Leonid Karlinsky

The digital conversion of information stored in documents is a great source of knowledge.

Vocal Bursts Type Prediction

Paper
Add Code

Task2Sim : Towards Effective Pre-training and Transfer from Synthetic Data

no code implementations • 30 Nov 2021 • Samarth Mishra, Rameswar Panda, Cheng Perng Phoo, Chun-Fu Chen, Leonid Karlinsky, Kate Saenko, Venkatesh Saligrama, Rogerio S. Feris

It is thus better to tailor synthetic pre-training data to a specific downstream task, for best performance.

Paper
Add Code

Unsupervised Domain Generalization by Learning a Bridge Across Domains

1 code implementation • CVPR 2022 • Sivan Harary, Eli Schwartz, Assaf Arbelle, Peter Staar, Shady Abu-Hussein, Elad Amrani, Roei Herzig, Amit Alfassy, Raja Giryes, Hilde Kuehne, Dina Katabi, Kate Saenko, Rogerio Feris, Leonid Karlinsky

The ability to generalize learned representations across significantly different visual domains, such as between real photos, clipart, paintings, and sketches, is a fundamental capacity of the human visual system.

Domain Generalization Self-Supervised Learning

Paper
Code

Task2Sim: Towards Effective Pre-Training and Transfer From Synthetic Data

no code implementations • CVPR 2022 • Samarth Mishra, Rameswar Panda, Cheng Perng Phoo, Chun-Fu (Richard) Chen, Leonid Karlinsky, Kate Saenko, Venkatesh Saligrama, Rogerio S. Feris

It is thus better to tailor synthetic pre-training data to a specific downstream task, for best performance.

Paper
Add Code

Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens

no code implementations • 13 Jun 2022 • Elad Ben-Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson

We explore a particular instantiation of scene structure, namely a \emph{Hand-Object Graph}, consisting of hands and objects with their locations as nodes, and physical relations of contact/no-contact as edges.

Action Recognition Video Understanding

Paper
Add Code

Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022

no code implementations • 15 Jun 2022 • Elad Ben-Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson

First, as both images and videos contain structured information, we enrich a transformer model with a set of \emph{object tokens} that can be used across images and videos.

Point- of-no-return (PNR) temporal localization Temporal Localization

Paper
Add Code

FETA: Towards Specializing Foundation Models for Expert Task Applications

1 code implementation • 8 Sep 2022 • Amit Alfassy, Assaf Arbelle, Oshri Halimi, Sivan Harary, Roei Herzig, Eli Schwartz, Rameswar Panda, Michele Dolfi, Christoph Auer, Kate Saenko, PeterW. J. Staar, Rogerio Feris, Leonid Karlinsky

However, as we show in this paper, FMs still have poor out-of-the-box performance on expert tasks (e. g. retrieval of car manuals technical illustrations from language queries), data for which is either unseen or belonging to a long-tail part of the data distribution of the huge datasets used for FM pre-training.

Ranked #1 on Image-to-Text Retrieval on FETA Car-Manuals

Domain Generalization Image Retrieval +6

Paper
Code

VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models

1 code implementation • 12 Sep 2022 • Felix Vogel, Nina Shvetsova, Leonid Karlinsky, Hilde Kuehne

We follow up with the analysis of the attribute-based zero-shot learning capabilities of these models, evaluating how well this classical zero-shot notion emerges from large-scale webly supervision.

Attribute Retrieval +2

Paper
Code

Contrastive Audio-Visual Masked Autoencoder

1 code implementation • 2 Oct 2022 • Yuan Gong, Andrew Rouditchenko, Alexander H. Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James Glass

In this paper, we first extend the recent Masked Auto-Encoder (MAE) model from a single modality to audio-visual multi-modalities.

Ranked #1 on Audio Tagging on AudioSet (using extra training data)

Audio Classification Audio Tagging +4

199

Paper
Code

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

1 code implementation • 7 Oct 2022 • Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogerio Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James Glass

Inspired by the fact that English text-video retrieval outperforms other languages, we train a student model using input text in different languages to match the cross-modal predictions from teacher models using input text in English.

Knowledge Distillation Retrieval +2

Paper
Code

On the Importance of Calibration in Semi-supervised Learning

no code implementations • 10 Oct 2022 • Charlotte Loh, Rumen Dangovski, Shivchander Sudalairaj, Seungwook Han, Ligong Han, Leonid Karlinsky, Marin Soljacic, Akash Srivastava

State-of-the-art (SOTA) semi-supervised learning (SSL) methods have been highly successful in leveraging a mix of labeled and unlabeled data by combining techniques of consistency regularization and pseudo-labeling.

Paper
Add Code

ConStruct-VL: Data-Free Continual Structured VL Concepts Learning

1 code implementation • CVPR 2023 • James Seale Smith, Paola Cascante-Bonilla, Assaf Arbelle, Donghyun Kim, Rameswar Panda, David Cox, Diyi Yang, Zsolt Kira, Rogerio Feris, Leonid Karlinsky

This leads to reasoning mistakes, which need to be corrected as they occur by teaching VL models the missing SVLC skills; often this must be done using private data where the issue was found, which naturally leads to a data-free continual (no task-id) VL learning setting.

Paper
Code

Teaching Structured Vision&Language Concepts to Vision&Language Models

1 code implementation • 21 Nov 2022 • Sivan Doveh, Assaf Arbelle, Sivan Harary, Rameswar Panda, Roei Herzig, Eli Schwartz, Donghyun Kim, Raja Giryes, Rogerio Feris, Shimon Ullman, Leonid Karlinsky

Vision and Language (VL) models have demonstrated remarkable zero-shot performance in a variety of tasks.

Paper
Code

On the Transferability of Visual Features in Generalized Zero-Shot Learning

1 code implementation • 22 Nov 2022 • Paola Cascante-Bonilla, Leonid Karlinsky, James Seale Smith, Yanjun Qi, Vicente Ordonez

Generalized Zero-Shot Learning (GZSL) aims to train a classifier that can generalize to unseen classes, using a set of attributes as auxiliary information, and the visual features extracted from a pre-trained convolutional neural network.

Generalized Zero-Shot Learning Knowledge Distillation +2

Paper
Code

CODA-Prompt: COntinual Decomposed Attention-based Prompting for Rehearsal-Free Continual Learning

1 code implementation • CVPR 2023 • James Seale Smith, Leonid Karlinsky, Vyshnavi Gutta, Paola Cascante-Bonilla, Donghyun Kim, Assaf Arbelle, Rameswar Panda, Rogerio Feris, Zsolt Kira

Our experiments show that we outperform the current SOTA method DualPrompt on established benchmarks by as much as 4. 5% in average final accuracy.

Continual Learning Novel Concepts

108

Paper
Code

MAEDAY: MAE for few and zero shot AnomalY-Detection

1 code implementation • 25 Nov 2022 • Eli Schwartz, Assaf Arbelle, Leonid Karlinsky, Sivan Harary, Florian Scheidegger, Sivan Doveh, Raja Giryes

We propose using Masked Auto-Encoder (MAE), a transformer model self-supervisedly trained on image inpainting, for anomaly detection (AD).

Anomaly Detection Image Inpainting +4

Paper
Code

PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene Data

no code implementations • 8 Dec 2022 • Roei Herzig, Ofir Abramovich, Elad Ben-Avraham, Assaf Arbelle, Leonid Karlinsky, Ariel Shamir, Trevor Darrell, Amir Globerson

In this work, we propose an approach to leverage synthetic scene data for improving video understanding.

Action Recognition Video Understanding

Paper
Add Code

Teaching Structured Vision & Language Concepts to Vision & Language Models

1 code implementation • CVPR 2023 • Sivan Doveh, Assaf Arbelle, Sivan Harary, Eli Schwartz, Roei Herzig, Raja Giryes, Rogerio Feris, Rameswar Panda, Shimon Ullman, Leonid Karlinsky

Vision and Language (VL) models have demonstrated remarkable zero-shot performance in a variety of tasks.

Paper
Code

Learning to Grow Pretrained Models for Efficient Transformer Training

no code implementations • 2 Mar 2023 • Peihao Wang, Rameswar Panda, Lucas Torroba Hennigen, Philip Greengard, Leonid Karlinsky, Rogerio Feris, David Daniel Cox, Zhangyang Wang, Yoon Kim

Scaling transformers has led to significant breakthroughs in many domains, leading to a paradigm in which larger versions of existing models are trained and released on a periodic basis.

Paper
Add Code

Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning

no code implementations • 6 Mar 2023 • Zhen Wang, Rameswar Panda, Leonid Karlinsky, Rogerio Feris, Huan Sun, Yoon Kim

Prompt tuning, in which a base pretrained model is adapted to each task via conditioning on learned prompt vectors, has emerged as a promising approach for efficiently adapting large language models to multiple downstream tasks.

Transfer Learning

Paper
Add Code

MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge

1 code implementation • ICCV 2023 • Wei Lin, Leonid Karlinsky, Nina Shvetsova, Horst Possegger, Mateusz Kozinski, Rameswar Panda, Rogerio Feris, Hilde Kuehne, Horst Bischof

We adapt a VL model for zero-shot and few-shot action recognition using a collection of unlabeled videos and an unpaired action dictionary.

Ranked #3 on Zero-Shot Action Recognition on Kinetics

Few-Shot action recognition Few Shot Action Recognition +5

Paper
Code

Going Beyond Nouns With Vision & Language Models Using Synthetic Data

1 code implementation • ICCV 2023 • Paola Cascante-Bonilla, Khaled Shehada, James Seale Smith, Sivan Doveh, Donghyun Kim, Rameswar Panda, Gül Varol, Aude Oliva, Vicente Ordonez, Rogerio Feris, Leonid Karlinsky

We contribute Synthetic Visual Concepts (SyViC) - a million-scale synthetic dataset and data generation codebase allowing to generate additional suitable data to improve VLC understanding and compositional reasoning of VL models.

Ranked #68 on Visual Reasoning on Winoground

Sentence Visual Reasoning

Paper
Code

Constructive Assimilation: Boosting Contrastive Learning Performance through View Generation Strategies

no code implementations • 2 Apr 2023 • Ligong Han, Seungwook Han, Shivchander Sudalairaj, Charlotte Loh, Rumen Dangovski, Fei Deng, Pulkit Agrawal, Dimitris Metaxas, Leonid Karlinsky, Tsui-Wei Weng, Akash Srivastava

Recently, several attempts have been made to replace such domain-specific, human-designed transformations with generated views that are learned.

Contrastive Learning Representation Learning

Paper
Add Code

Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs

no code implementations • 10 May 2023 • Roei Herzig, Alon Mendelson, Leonid Karlinsky, Assaf Arbelle, Rogerio Feris, Trevor Darrell, Amir Globerson

For the visual side, we incorporate a special "SG Component" in the image transformer trained to predict SG information, while for the textual side, we utilize SGs to generate fine-grained captions that highlight different compositional aspects of the scene.

Ranked #24 on Visual Reasoning on Winoground

Scene Understanding Visual Reasoning

Paper
Add Code

Listen, Think, and Understand

1 code implementation • 18 May 2023 • Yuan Gong, Hongyin Luo, Alexander H. Liu, Leonid Karlinsky, James Glass

On the other hand, modern large language models (LLMs) exhibit emerging reasoning ability but they lack audio perception capabilities.

Ranked #3 on Music Question Answering on MusicQA (using extra training data)

Language Modelling Large Language Model +1

279

Paper
Code

Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages

no code implementations • 21 May 2023 • Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogerio Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James Glass

Recent models such as XLS-R and Whisper have made multilingual speech technologies more accessible by pre-training on audio from around 100 spoken languages each.

Paper
Add Code

TAP: Targeted Prompting for Task Adaptive Generation of Textual Training Instances for Visual Classification

1 code implementation • 13 Sep 2023 • M. Jehanzeb Mirza, Leonid Karlinsky, Wei Lin, Horst Possegger, Rogerio Feris, Horst Bischof

Vision and Language Models (VLMs), such as CLIP, have enabled visual recognition of a potentially unlimited set of categories described by text prompts.

Zero-Shot Learning

Paper
Code

Joint Audio and Speech Understanding

1 code implementation • 25 Sep 2023 • Yuan Gong, Alexander H. Liu, Hongyin Luo, Leonid Karlinsky, James Glass

Humans are surrounded by audio signals that include both speech and non-speech sounds.

279

Paper
Code

Self-Specialization: Uncovering Latent Expertise within Large Language Models

no code implementations • 29 Sep 2023 • Junmo Kang, Hongyin Luo, Yada Zhu, James Glass, David Cox, Alan Ritter, Rogerio Feris, Leonid Karlinsky

Recent works have demonstrated the effectiveness of self-alignment in which a large language model is, by itself, aligned to follow general instructions through the automatic generation of instructional data using a handful of human-written seeds.

Hallucination Instruction Following +2

Paper
Add Code

GeRA: Label-Efficient Geometrically Regularized Alignment

no code implementations • 1 Oct 2023 • Dustin Klebe, Tal Shnitzer, Mikhail Yurochkin, Leonid Karlinsky, Justin Solomon

We introduce a semi-supervised Geometrically Regularized Alignment (GeRA) method to align the embedding spaces of pretrained unimodal encoders in a label-efficient way.

Paper
Add Code

Learning Human Action Recognition Representations Without Real Humans

1 code implementation • NeurIPS 2023 • Howard Zhong, Samarth Mishra, Donghyun Kim, SouYoung Jin, Rameswar Panda, Hilde Kuehne, Leonid Karlinsky, Venkatesh Saligrama, Aude Oliva, Rogerio Feris

To this end, we present, for the first time, a benchmark that leverages real-world videos with humans removed and synthetic data containing virtual humans to pre-train a model.

Action Recognition Ethics +2

Paper
Code

3VL: using Trees to teach Vision & Language models compositional concepts

no code implementations • 28 Dec 2023 • Nir Yellinek, Leonid Karlinsky, Raja Giryes

Vision-Language models (VLMs) have proved effective at aligning image and text representations, producing superior zero-shot results when transferred to many downstream tasks.

Paper
Add Code

Large Scale Generative AI Text Applied to Sports and Music

no code implementations • 31 Jan 2024 • Aaron Baughman, Stephen Hammer, Rahul Agarwal, Gozde Akay, Eduardo Morales, Tony Johnson, Leonid Karlinsky, Rogerio Feris

We address the problem of scaling up the production of media content, including commentary and personalized news stories, for large-scale sports and music events worldwide.

Paper
Add Code

CAMELoT: Towards Large Language Models with Training-Free Consolidated Associative Memory

no code implementations • 21 Feb 2024 • Zexue He, Leonid Karlinsky, Donghyun Kim, Julian McAuley, Dmitry Krotov, Rogerio Feris

Large Language Models (LLMs) struggle to handle long input sequences due to high memory and runtime costs.

In-Context Learning

Paper
Add Code

Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs

1 code implementation • 18 Mar 2024 • M. Jehanzeb Mirza, Leonid Karlinsky, Wei Lin, Sivan Doveh, Jakub Micorek, Mateusz Kozinski, Hilde Kuhene, Horst Possegger

Prompt ensembling of Large Language Model (LLM) generated category-specific prompts has emerged as an effective method to enhance zero-shot recognition ability of Vision-Language Models (VLMs).

Language Modelling Large Language Model +1

Paper
Code

Towards Multimodal In-Context Learning for Vision & Language Models

no code implementations • 19 Mar 2024 • Sivan Doveh, Shaked Perek, M. Jehanzeb Mirza, Amit Alfassy, Assaf Arbelle, Shimon Ullman, Leonid Karlinsky

Inspired by the emergence of Large Language Models (LLMs) that can truly understand human language, significant progress has been made in aligning other, non-language, modalities to be `understandable' by an LLM, primarily via converting their samples into a sequence of embedded language-like tokens directly fed into the LLM (decoder) input stream.

In-Context Learning

Paper
Add Code

NumeroLogic: Number Encoding for Enhanced LLMs' Numerical Reasoning

no code implementations • 30 Mar 2024 • Eli Schwartz, Leshem Choshen, Joseph Shtok, Sivan Doveh, Leonid Karlinsky, Assaf Arbelle

Language models struggle with handling numerical data and performing arithmetic operations.

Language Modelling

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.