Yannis Avrithis
Yannis Avrithis

Working on computer vision and machine learning

News

Oct 28, 2024
Paper accepted at WACV 2025: Composed Image Retrieval
Oct 23, 2024
Paper accepted at TASL: Power Set Mixing
Sep 23, 2024
Felipe Torres Figueroa defends his PhD thesis on Interpretable Visual Recognition
Sep 10, 2024
Bill Psomas defends his PhD thesis on Visual and Multimodal Representations
Aug 09, 2024
Opti-CAM code released
Jul 29, 2024
Paper accepted at CVIU: Saliency Maps for Interpretability
Jul 01, 2024
Shashanka Venkataramanan defends his PhD thesis on Data Augmentation
Jun 11, 2024
Deniz Engin defends her PhD thesis on Video Question Answering
May 24, 2024
Preprint released at arXiv: Composite Retrieval for Remote Sensing
May 13, 2024
UT-KD code released
May 10, 2024
Preprint released at arXiv: Unsupervised Domain Adaptation
May 06, 2024
May 01, 2024
I serve as Area Chair of BMVC 2024
Apr 29, 2024
DoRA code released
Apr 23, 2024
Preprint released at arXiv: Interpretable Gradients
Apr 23, 2024
Preprint released at arXiv: Cross Attention Stream
Apr 07, 2024
Paper accepted at XAI4CV/CVPR 2024: Cross Attention Stream
Apr 01, 2024
Preprint released at arXiv: Revisiting Google Landmarks
Mar 29, 2024
RGLDv2-clean dataset released
Mar 27, 2024
I serve as Area Chair of NeurIPS 2024
Mar 15, 2024
Paper accepted at IGARSS 2024: Composite Retrieval for Remote Sensing
Feb 26, 2024
Paper accepted at CVPR 2024: Revisiting Google Landmarks
Jan 24, 2024
AM code released
Jan 19, 2024
Jan 18, 2024
WTours dataset released
Jan 16, 2024
Paper accepted at ICLR 2024: Walking Tours
Dec 23, 2023
I become a Reviewer of ICML
Dec 22, 2023
Paper accepted at VISAPP 2024: Interpretable Gradients
Dec 19, 2023
Preprint released at arXiv: PowMix
Nov 24, 2023

About me

I am a researcher on computer vision and machine learning. Between 2022 and 2023 I have been a Principal Investigator at the Institute of Advanced Research on Artificial Intelligence (IARAI). Between 2021 and 2022 I have been a Research Director in Information Management Systems Institute (IMSI) of Athena Research Center. I enjoy working on learning visual representations from data, with as little supervision as possible. My recent work is focusing on different learning settings including metric learning, incremental learning and few-shot learning; multimodal learning including vision, language and 3D models; interpretability; and video question answering.

Between 2016 and 2021 I have been a research scientist in LinkMedia team of Inria Rennes-Bretagne Atlantique and I have been teaching Deep Learning for Vision at University of Rennes 1. My work has focused on exploring the manifold structure of data and, apart from image retrieval, using it for unsupervised, semi-supervised and few-shot learning. I have also worked on adversarial examples and on investigating the sparsity of convolutional activations, applied to spatial matching and unsupervised object discovery. In 2020, I was awarded the Habilitation à Diriger des Recherches (HDR) qualification from University of Rennes 1.

Between 2016 and 2021 I have been a research scientist in LinkMedia team of Inria Rennes-Bretagne Atlantique and I have been teaching Deep Learning for Vision at University of Rennes 1. My work has focused on exploring the manifold structure of data and, apart from image retrieval, using it for unsupervised, semi-supervised and few-shot learning. I have also worked on adversarial examples and on investigating the sparsity of convolutional activations, applied to spatial matching and unsupervised object discovery. In 2020, I was awarded the Habilitation à Diriger des Recherches (HDR) qualification from University of Rennes 1.

In 2015 I have been at the Laboratory of Algebraic and Geometric Algorithms (ΕρΓΑ) of National and Kapodistrian University of Athens (NKUA), where I have worked on large-scale clustering and nearest neighbor search with Ioannis Emiris. An achievement of this period has been Web-Scale Image Clustering.

Between 2001 and 2015 I have been at the Image, Video and Multimedia Systems Laboratory (IVML) of the National Technical University of Athens (NTUA). In the latter part of this period, since 2008, I have been leading the Image and Video Analysis (IVA) team. We have worked on a diverse set of problems including local feature and salient region detection, spatial and spatiotemporal visual representation and matching, object recognition and tracking, action recognition in video, scene classification, image/video indexing, retrieval and summarization.

A large body of this work has focused on local feature/descriptor representations. In this context, we have delivered repeatable feature detection, extremely fast spatial matching, geometry indexing, multi-view and single-view feature selection, approximate nearest neighbor search for large scale clustering and vocabulary construction, and mining 3d scenes from millions of images. Examples of this work are Hough Pyramid Matching (HPM), the Aggregated Selective Match Kernel (ASMK), and Locally Optimized Product Quantization (LOPQ). In 2017, using a CNN image representation, Yahoo! Research has chosen LOPQ to index and provide a similar image search functionality on its entire Flickr collection. A flagship product of this period has been our unique application Visual Image Retrieval and Localization (VIRaL).

In the preceding period before 2008, I have worked on problems related to object detection and image understanding. This includes regions-of-interest for generic object detection, semantic segmentation, integrated segmentation and region labeling, spatio-temporal object segmentation and tracking, image classification, as well as the use of visual context and prior knowledge.

While at NTUA, I have had the opportunity to collaborate with a large number of researchers across Europe by participating in Networks of Excellence Muscle and K-Space and Integrated Projects like aceMedia and WeKnowIt. I have developed a number of activities including being the Program Chair or General Chair of workshops and conferences in the field of multimedia like WIAMIS, CBMI, MMM and CIVR. For several years I have been teaching Signals and Systems and Image and Video Analysis.

Between 1996 and 2001 I have been working on my Ph.D. at NTUA with Stefanos Kollias, studying visual representations for video sequence analysis. I have investigated accurate spatio-temporal segmentation by fusing color, motion and depth, as well as a region-based representation for retrieval and summarization in the form of key-frames. I have designed an affine-invariant representation of object contours for shape-based object classification and retrieval. Finally, I have studied temporal segmentation and parsing of broadcast news based on human face detection.

In 1993-1994 I completed a M.Sc. with Distinction in Communications and Signal Processing at Imperial College, University of London. As part of my Masters thesis, I have worked on communications and investigated the capacity of a cellular CDMA system with Athanassios Manikas.

Between 1998 and 1993 I have studied Electrical and Computer Engineering at NTUA. As part of my Diploma thesis, I have worked on analog electronics and developed a hardware implementation of a fuzzy logic processor with Yannis Tsividis. Before that, my first exposure to computer vision and machine learning has been the study of an invariant representation for optical character recognition with Anastasios Delopoulos, leading to my first conference publication before my Diploma.

Recent publications

PowMix: a Versatile Regularizer for Multimodal Sentiment Analysis
E. Georgiou, Y. Avrithis, A. Potamianos
TASL, In press
Composed Image Retrieval for Training-Free Domain Conversion
N. Efthymiadis, B. Psomas, Z. Laskar, K. Karantzalos, Y. Avrithis, O. Chum, G. Tolias
WACV 2025
CA-Stream: Attention-Based Pooling for Interpretable Image Recognition
F. Torres Figueroa, H. Zhang, R. Sicre, S. Ayache, Y. Avrithis
Composed Image Retrieval for Remote Sensing
B. Psomas, I. Kakogeorgiou, N. Efthymiadis, G. Tolias, O. Chum, Y. Avrithis, K. Karantzalos
On Train-Test Class Overlap and Detection for Image Retrieval
C.H. Song, J. Yoon, T. Hwang, S. Choi, Y.H. Gu, Y. Avrithis
Is Imagenet Worth 1 Video? Learning Strong Image Encoders from 1 Long Unlabelled Video
S. Venkataramanan, M.N. Rizve, J. Carreira, Y.M. Asano, Y. Avrithis
A Learning Paradigm for Interpretable Gradients
F. Torres Figueroa, H. Zhang, R. Sicre, Y. Avrithis, S. Ayache
Opti-CAM: Optimizing Saliency Maps for Interpretability
H. Zhang, F. Torres, R. Sicre, Y. Avrithis, S. Ayache
Generating Part-Aware Editable 3D Shapes without 3D Supervision
K. Tertikas, D. Paschalidou, B. Pan, J.J. Park, M.A. Uy, I. Emiris, Y. Avrithis, L. Guibas
Boosting Vision Transformers for Image Retrieval
C.H. Song, J. Yoon, S. Choi, Y. Avrithis
What to Hide from Your Students: Attention-Guided Masked Image Modeling
I. Kakogeorgiou, S. Gidaris, B. Psomas, Y. Avrithis, A. Bursuc, K. Karantzalos, N. Komodakis
AlignMixup: Improving Representations by Interpolating Aligned Features
S. Venkataramanan, E. Kijak, L. Amsaleg, Y. Avrithis
It Takes Two to Tango: Mixup for Deep Metric Learning
S. Venkataramanan, B. Psomas, E. Kijak, L. Amsaleg, K. Karantzalos, Y. Avrithis
On the Hidden Treasure of Dialog in Video Question Answering
D. Engin, N.Q.K. Duong, F. Schnitzler, Y. Avrithis
Local Propagation for Few-Shot Learning
Y. Lifchitz, Y. Avrithis, S. Picard
Smooth Adversarial Examples
H. Zhang, Y. Avrithis, T. Furon, L. Amsaleg
Label Propagation for Deep Semi-Supervised Learning
A. Iscen, G. Tolias, Y. Avrithis, O. Chum
Dense Classification and Implanting for Few-Shot Learning
Y. Lifchitz, Y. Avrithis, S. Picard, A. Bursuc
Graph-Based Particular Object Discovery
O. Siméoni, A. Iscen, G. Tolias, Y. Avrithis, O. Chum
Mining on Manifolds: Metric Learning without Labels
A. Iscen, G. Tolias, Y. Avrithis, O. Chum
Revisiting Oxford And Paris: Large-Scale Image Retrieval Benchmarking
F. Radenović, A. Iscen, G. Tolias, Y. Avrithis, O. Chum
Fast Spectral Ranking for Similarity Search
A. Iscen, Y. Avrithis, G. Tolias, T. Furon, O. Chum
Unsupervised Object Discovery for Instance Recognition
O. Siméoni, A. Iscen, G. Tolias, Y. Avrithis, O. Chum
Automatic Discovery of Discriminative Parts as a Quadratic Assignment Problem
R. Sicre, J. Rabin, Y. Avrithis, T. Furon, F. Jurie, E. Kijak
Unsupervised Part Learning for Visual Recognition
R. Sicre, Y. Avrithis, E. Kijak, F. Jurie
Panorama to Panorama Matching for Location Recognition
A. Iscen, G. Tolias, Y. Avrithis, T. Furon, O. Chum
$\alpha$-Shapes for Local Feature Detection
C. Varytimidis, K. Rapantzikos, Y. Avrithis, S. Kollias
Web-Scale Image Clustering Revisited
Y. Avrithis, Y. Kalantidis, E. Anagnostopoulos, I. Emiris
Planar Shape Decomposition Made Simple
N. Papanelopoulos, Y. Avrithis
Improving Local Features by Dithering-Based Image Sampling
C. Varytimidis, K. Rapantzikos, Y. Avrithis, S. Kollias
Towards Large-Scale Geometry Indexing by Feature Selection
G. Tolias, Y. Kalantidis, Y. Avrithis, S. Kollias
Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention
G. Evangelopoulos, A. Zlatintsi, A. Potamianos, P. Maragos, K. Rapantzikos, G. Skoumas, Y. Avrithis
Scalable Triangulation-Based Logo Recognition
Y. Kalantidis, L.G. Pueyo, M. Trevisiol, R. van Zwol, Y. Avrithis
VIRaL: Visual Image Retrieval and Localization
Y. Kalantidis, G. Tolias, Y. Avrithis, M. Phinikettos, E. Spyrou, Ph. Mylonas, S. Kollias