Yannis Avrithis - Code and Data (by author)

Y. Avrithis

PyNet

Python BSD license 2016

PyNet is a minimal Python library for dynamic automatic differentiation. The focus is on simplicity and it is meant to accompany the differentiation lecture of Deep Learning for Vision course. It provides a tape-based automatic differentiation mechanism similar to that of PyTorch, allowing dynamic computational graph creation in plain Python code, including loops, conditionals etc. The initial implementation has included both a CPU backend in NumPy and a GPU backend in Neon. This version includes only the CPU backend and is meant for educational purposes.

Inverted-quantized $k$-means

Y. Avrithis

IQM

8

Matlab, C++ based on AGM, Yael, xio BSD license published in ICCV 2015 2015

E. Anagnostopoulos: Framework and baselines for small-scale (1M) experiments.
Y. Kalantidis: Framework and baselines for large-scale (100M) experiments. This includes extraction of CNN features for the 100M collection using Caffe and a distributed $k$-means baseline implemented in Spark.

IQM is an extremely efficient clustering algorithm operating on an extremely compressed data representation, for instance 26 bits/vector. It is a variant of $k$-means that quantizes vectors and uses inverted search from centroids to cells, while dynamically determining the number of clusters, following AGM. Using global CNN image representations, IQM scales up to clustering of a collection of 100M images in less than an hour on a single processor.

Dimensionality-recursive vector quantization

Y. Avrithis

DRVQ

10

C++ based on ivl BSD license published in ICCV 2013 2013

DRVQ is a fast vector quantization method in high-dimensional Euclidean spaces under arbitrary data distributions. It is an approximation of $k$-means that is practically constant in data size and applies to arbitrarily high dimensions but can only scale to a few thousands of centroids. As a by-product of training, a tree structure performs either exact or approximate quantization on trained centroids, the latter being not very precise but extremely fast. The combination of C++ recursive virtual functions for tree implementation with Matlab-like syntax for matrix operations has allowed fast prototyping, readable code and optimal performance in one piece of software.

ivl2

Y. Avrithis

ivl2

6

C++11 header-only GNU LGPL2/3 license 2013-2014

ivl2 is an effort to re-design and re-implement ivl in the C++11 language standard. In contrast to ivl, which has targeted wide adoption, this is an experimental effort targeting exploitation of latest progress in the language to simplify its implementation and generalize its functionality and syntax. It makes full use of new features including variadic templates, template aliases, type inference, rvalue references and move semantics.

Its design is centered around a small number of orthogonal concepts that can be combined in arbitrary ways to yield an extremely powerful syntax. Among others, it offers a unique extension of std::type_traits and std::tuple, going far beyond the standard design to support views, expression templates, algorithms, and a unique common interface to tuples and static/dynamic arrays. It generalizes C++ iterators and D ranges. It overloads all C++ operators and functions to automatically support arbitrary combinations of scalars, arrays or tuples in an arbitrary number of arguments, which is not possible with ivl or any C++98 code.

ivl2 is a complex and abstract piece of software consisting of hundreds of source files. It offers a unique blend of features not currently available in any other library or language. It is written from scratch and the code is clean, organized and optimized.

Approximate Gaussian mixture (Demo version)

Y. Avrithis

AGM

Matlab published in ECCV 2012 2011-2012

AGM is a clustering method that combines the flexibility of Gaussian mixtures with the scaling properties needed to construct large visual vocabularies for image retrieval. The algorithm can dynamically estimate the number of clusters, which is referred to as expanding Gaussian mixture (EGM). It also provides significant speed-up by employing approximate nearest neighbor search in assigning points to clusters, which is referred to as approximate Gaussian mixture (AGM). This is a demo version on a toy 2D example. The production version is not public.

Hough pyramid matching (Internal version)

Y. Avrithis

HPM-int

C++ based on HPM, OpenCV published in ICCV 2011, IJCV 2014 2012

This is an internal version of HPM. The dependence on ivl has been removed and the code has been integrated with OpenCV data structures for local features. It is available upon request.

Feature map hashing/similarity (Demo version)

Y. Avrithis

FMH

Matlab published in ACM-MM 2010, CVIU 2014 2009

FMH is a method for image indexing and retrieval, which integrates appearance with global image geometry in the indexing process, while enjoying robustness against viewpoint change, photometric variations, occlusion, and background clutter. To handle its increased memory requirements, hashing has been subsequently substituted with an automated and unsupervised feature selection model, leading to feature map similarity (FMS). This version is a prototype of the original idea on a toy 2D example. It is available upon request.

Medial feature detector

Y. Avrithis

MFD

C++ based on ivl, OpenCV, VGG Affine Features published in ICCV 2011 2009-2011

MFD is a local feature detector. Given an input image, it gives access to all intermediate results including a (weighted) distance transform, (weighted) medial axis, an image partition generalizing topological watershed, and the detected features with optional descriptors using the VGG software. MFD also provides detailed statistics through several commands and options, including interactive visualization and debugging. It can operate in batch mode, optionally recursing subfolders. It has a special mode for binary images providing faster implementation, useful for binary distance transform and medial axis. In this case it also offers sub-pixel accuracy. The code is highly optimized, with running times in the order of 0.5 seconds for an image of 1Mpixel. A 15-page documentation is provided.

Asymmetric metric learning

M. Budnik (advised by Y. Avrithis)

AML

3

PyTorch based on CIR-torch published in CVPR 2021 2020-2021

Focusing on instance-level image retrieval, we study an asymmetric testing task, where the database is represented by the teacher and queries by the student. Inspired by this task, we introduce a novel paradigm of using asymmetric representations at training. This acts as a simple combination of knowledge transfer with the original metric learning task. The code allows the reproduction of the results of our CVPR 2021 paper.

Semi-supervised active learning

M. Budnik, O. Siméoni (advised by Y. Avrithis)

SSAL

15

PyTorch based on DeepCluster, DLP MIT license published in ICPR 2020 2019-2020

This is a deep active learning framework allowing a systematic evaluation of different acquisition functions with or without methods that make use of the unlabeled data during model training. In particular, this includes (i) unsupervised pre-training, as implemented by DeepCluster, and (ii) semi-supervised learning, as implemented by our deep label propagation. The code allows the reproduction of the results of our ICPR 2020 paper.

Video Question Answering with Multi-Modal Prompts

D. Engin (advised by Y. Avrithis)

ViTiS

9

PyTorch based on FrozenBiLM, P-tuning-v2 Apache-2.0 license published in CLVL/ICCV 2023 2023

ViTiS is a parameter-efficient method for adaptation of large-scale pretrained vision-language models on limited data, addressing challenges such as overfitting, catastrophic forgetting, and the cross-modal gap between vision and language. It combines multimodal prompt learning and a transformer-based mapping network, while keeping the pretrained models frozen. We apply it to Zero-Shot and Few-Shot Video Question Answering. The code allows the reproduction of the results of our CLVL/ICCV 2023 paper.

Video question answering

D. Engin (advised by N. Q. K. Duong, F. Schnitzler, Y. Avrithis)

VideoQA

10

PyTorch based on ROLL-VideoQA Proprietary license published in ICCV 2021 2021

This is a Video Question Answering (VideoQA) method, where we address understanding of stories in video such as movies and TV shows from raw data, without external sources like plot synopses, scripts, video descriptions or knowledge bases. We treat dialog as a noisy source to be converted into text description via dialog summarization, much like recent methods treat video. The input of each modality is encoded by transformers independently, then we fuse all modalities using soft temporal attention for localization over long inputs. The code allows the reproduction of the results of our ICCV 2021 paper.

Graph convolutional cleaning

A. Iscen, G. Tolias (advised by O. Chum, C. Schmid, Y. Avrithis)

GCC

23

PyTorch based on GCN, FAISS Apache-2.0 license published in ECCV 2020 2019-2020

We learn a classifier from few clean and many noisy labels. The structure of clean and noisy data is modeled by a graph per class and graph convolutional networks are used to predict class relevance of noisy examples. This cleaning method is evaluated on an extended version of a few-shot learning problem, where the few clean examples of novel classes are supplemented with additional noisy data. The code allows the reproduction of the results of our ECCV 2020 paper.

Deep label propagation

A. Iscen (advised by G. Tolias, O. Chum, Y. Avrithis)

DLP

112

PyTorch based on Mean Teacher, FAISS MIT license published in CVPR 2019 2018-2019

DLP is a modern deep learning approach and an inductive version of classic label propagation for semi-supervised learning based on manifold similarity. The code allows the reproduction of the results of CVPR 2019.

Mining on manifolds

A. Iscen, G. Tolias (advised by O. Chum, Y. Avrithis)

MoM

34

PyTorch, MatConvNet based on MLbench, DLP published in CVPR 2018 2017-2018

MoM is one of the very few self-supervised metric learning methods. Building on findings of manifold similarity, it learns a representation space where Euclidean neighbors are determined according to manifold neighbors in the original feature space. It is applied to fine-grained classification as well as particular object retrieval.

Revisiting Oxford and Paris

F. Radenovic, A. Iscen (advised by G. Tolias, O. Chum, Y. Avrithis)

RevOP

240

Matlab, Python based on Oxford5k, Paris6k published in CVPR 2018 2017

RevOP is an image retrieval benchmark. It is the result of revisiting the two most popular image retrieval datasets, Oxford5k and Paris6k. We provide new annotation for both datasets with an extra attention to the reliability of the ground truth. All co-authors have independently annotated the entire dataset; the final annotation is the result of merging all individual contributions with an automated voting process. We introduce 15 new, more difficult queries per dataset and update the evaluation protocol by introducing three new settings of varying difficulty. We also create a new set of one million challenging distractors. The package includes Matlab and Python code to download and process the data and evaluate results on the new benchmark.

Graph-based object discovery

O. Siméoni, A. Iscen (advised by G. Tolias, O. Chum, Y. Avrithis)

GOD

MatConvNet based on Diffusion, AGM published in WACV 2018, MVA 2019 2017-2018

GOD captures discriminative patterns from regional CNN activations of an entire dataset, suppressing background clutter. A saliency measure is defined, based on a centrality measure of a nearest neighbor graph constructed from regional CNN representations of dataset images. Salient regions are then detected using an extended version of expanding Gaussian mixture. The code is not public yet.

Instance retrieval benchmark 2

A. Iscen (advised by G. Tolias, Y. Avrithis)

INSTRE2

based on INSTRE published in CVPR 2017 2016

This is a new version of the INSTRE benchmark for instance-level object retrieval and recognition. It has been developed as part of our work on diffusion. In particular, we are re-hosting the dataset at Inria because the original version is unavailable, we introduce a new evaluation protocol that is in line with other well known datasets and we provide a rich set of baselines to facilitate comparisons.

Diffusion for image retrieval

A. Iscen, G. Tolias (advised by O. Chum, T. Furon, Y. Avrithis)

Diffusion

87

Matlab based on Yael GNU GPL3+ license published in CVPR 2017 2016-2017

Diffusion is a manifold search method that uses a random walk on the nearest neighbor graph of a dataset. It has been extended to a spectral approach and a hybrid variant of the two for image retrieval. The code and data allows the reproduction of the results of our CVPR 2017 paper. In particular, we provide the descriptors used and the necessary ground-truth files for mAP evaluation. We also make available the approximate $k$-NN graph computed off-line for large-scale datasets.

Aggregated selective match kernel (Python version)

T. Jenicek, G. Tolias (advised by O. Chum)

ASMK-py

67

Python based on ASMK, CIR-torch, FAISS MIT license published in ICCV 2013, HOW 2020

This is a Python implementation of ASMK. There are minor differences compared to the original ASMK method (ICCV 2013) and Matlab implementation, which are described in the HOW paper (ECCV 2020).

Simple attention-based pooling

B. Psomas, I. Kakogeorgiou (advised by K. Karantzalos, Y. Avrithis)

SimPool

90

PyTorch based on AttMask, DINO, ConvNeXt, DETR, timm, Metrix Apache-2.0 license published in ICCV 2023 2023

SimPool is a simple attention-based pooling mechanism as a replacement of the default one for both convolutional and transformer encoders. Whether supervised or self-supervised, it improves performance on pre-training and downstream tasks and provides attention maps delineating object boundaries in all cases. SimPool is the first method to obtain attention maps in supervised transformers of at least as good quality as self-supervised, without explicit losses or modifying the architecture. The code allows the reproduction of the results of our ICCV 2023 paper.

Attention-guided masked image modeling

I. Kakogeorgiou, B. Psomas (advised by S. Gidaris, Y. Avrithis, K. Karantzalos, N. Komodakis)

AttMask

56

PyTorch based on iBOT, DINO, BEiT, ImageNet-9 Apache-2.0 license published in ECCV 2022 2022

In the context of self-supervised pretraining of vision transformers, this is a masking strategy that can be used as an alternative to random masking for dense distillation-based masked image modeliing (MIM) as well as plain distillation-based self-supervised learning on classification tokens. In particular, in the distillation-based setting, a teacher transformer encoder generates an attention map, which we use to guide masking for the student. The code allows the reproduction of the results of our ECCV 2022 paper.

Locally optimized product quantization (Production version)

C. Melina, Y. Kalantidis

LOPQ-prod

560

Python, Spark based on LOPQ Apache-2.0 license published in CVPR 2014 2016-2017

This is a Python/Spark implementation of LOPQ. On top of CNN features, it has been used to power image similarity search on the entire Flickr collection.

Locally optimized product quantization (Demo version)

Y. Kalantidis (advised by Y. Avrithis)

LOPQ

published in CVPR 2014 2013-2014

Off-line learning: Matlab based on Yael
On-line search: Python, C++ based on ivl

LOPQ a method for approximate nearest neighbor search that has remained state of the art for several years at a scale of one billion vectors. Leveraging the very same data structure that is used to provide non-exhaustive search, that is, inverted lists or a multi-index, the idea is to locally optimize an individual product quantizer per cell and use it to encode residuals. Local optimization is over rotation and space decomposition. This code is for demonstration only. Pre-computing projections for all queries is only done to facilitate parameter tuning and is suboptimal.

Feature Selection by Symmetry

G. Tolias, Y. Kalantidis (advised by Y. Avrithis)

SymCity

C++ based on HPM, ivl published in ACM-MM 2012 2012

To reduce the space required for the index in large scale search, several methods focus on feature selection based on multiple views. In practice however, most images are unique, in the sense that they depict a unique view of an object or scene in the dataset and there is nothing to compare to. SymCity selects features in such unique images by self-similarity. In effect, we detect repeating patterns or local symmetries and select the participating features. The method itself is a variant of HPM, called Hough pyramid self-matching (HPSM) and maintains the same retrieval performance using only 20% of the required memory. The code is not public.

Approximate Gaussian mixture (Production version)

Y. Kalantidis (advised by Y. Avrithis)

AGM-prod

C++ based on ivl, FLANN published in ECCV 2012 2011-2012

This is the production version of AGM, allowing the reproduction of the results of our ECCV 2012 paper. The code is not public.

Flickr Logos 27

Y. Kalantidis, L.G. Pueyo, M. Trevisiol (advised by R. van Zwol, Y. Avrithis)

Logos27

based on Flickr Identity + Logo Design published in ICMR 2011 2011

This is an annotated logo dataset downloaded from Flickr group Identity + Logo Design and contains more than 4000 logo classes/brands in total. It consists of a training, a distractor and a query set, containing respectively 810 images with bounding boxes labeled into 27 classes, 4207 logo images/classes depicting clean logos and 270 images, half of which are annotated into 27 training classes and the other half do not depict logos.

European Cities 1M

Y. Kalantidis, G. Tolias (advised by Y. Avrithis)

EC1M

based on Flickr published in ACM-MM 2010 2010

EC1M Consists of 909k geo-tagged images from 22 European cities, crawled from Flickr using geographic queries covering a window of each city center. A subset of 1,081 images from Barcelona is annotated into 35 groups depicting the same scene; 17 of the groups are landmark scenes and 18 are non-landmark. Annotation is based respectively on tags and visual search / manual clean-up. In total, 157 of those images are defined as queries (up to 5 per group). Images of the remaining 21 cities are used as distractors. Most depict urban scenery like the ground-truth, making a challenging distractor dataset.

Scene maps

Y. Kalantidis, E. Spyrou, G. Tolias (advised by Y. Avrithis)

Scene-maps

C++ based on ivl, LPSolve published in ACM-MM 2010, MTAP 2011 2010-2011

Scene maps refers to a representation of image collections used for large scale image search and mining, and applied to location and landmark recognition. Starting from a geo-tagged dataset, we first group images geographically and then visually, where each visual cluster is assumed to depict different views of the the same scene. We align all views to one reference image and construct a 2D scene map by preserving details from all images while discarding repeating visual features. A scene map thus collectively represents a scene as seen from different viewpoints. The indexing, retrieval and spatial matching scheme then operates directly on scene maps. All clustering operations are based on kernel vector quantization (KVQ). The code is not public.

European Cities 50k

G. Tolias, Y. Kalantidis (advised by Y. Avrithis)

EC50k

based on Flickr published in ACM-MM 2010 2010

EC50k consists of 50,767 geo-tagged images from 14 European cities, crawled from Flickr using geographic queries covering a window of each city center. A subset of 778 images from 9 cities are annotated into 20 groups depicting the same scene. Annotation is based on tags and visual search / manual clean-up. In total, 100 of those images are defined as queries (5 per group). Images of the remaining 5 cities are used as distractors. Most depict urban scenery like the ground-truth, making a challenging distractor dataset.

Feature map hashing/similarity (Production version)

G. Tolias, Y. Kalantidis (advised by Y. Avrithis)

FMH-prod

C++ based on ivl, FMH published in ACM-MM 2010, CVIU 2014 2009-2012

This is the production version of FMH. The code is not public.

Visual Image Retrieval and Localization

Y. Kalantidis, G. Tolias, M. Phinikettos, E. Spyrou, Ph. Mylonas (advised by Y. Avrithis)

VIRaL

based on Flickr published in ACM-MM 2010, MTAP 2011 2008-2012

Application interface: M. Phinikettos PHP, Javascript 2008-2012
Core search engine: Y. Kalantidis, G. Tolias C++ 2008-2012
Explore/Routes: Y. Kalantidis, G. Tolias C++, PHP, Javascript based on Scene-maps 2011

VIRaL is a visual search engine available online since 2008. The query is an image, either uploaded, fetched from a given URL, or chosen from the its database. Given this single image, it retrieves visually similar images and estimates its location on the map. It also suggests tags that may be attached to the query image, identifies known landmarks or points of interest, and provides links to relevant Wikipedia articles. Its database contains 2.7M Flickr images from 43 cities in the world. It is able to recognize tens of thousands of landmarks.

Additional applications enhance its user experience. VIRaL Explore enables browsing of the entire VIRaL image collection on the world map. Starting in a given city or at any zoom level on the map, it places icons corresponding to grouped photos, along with landmark names and Wikipedia links, if applicable. Photos are grouped off-line according to whether they depict the same object, building, or scene, and most popular groups are shown on the map, according to zoom level. VIRaL Routes offers a unique browsing experience of personal photo collections. Collections are processed off-line to identify where they were taken and group them by scene; a route is then constructed on the map, showing icons of visited places.

VIRaL targets general public to demonstrate results of our research. It has been disseminated in several technical and wide-audience venues. It is a unique application, and one of the very few non-commercial CBIR engines listed by Wikipedia that is really operating online.

IVA visual representation, matching and search infrastructure

G. Tolias, Y. Kalantidis (advised by Y. Avrithis)

iva

C++ based on ivl, OpenCV 2008-2014

This is a collection of software that has been used internally as infrastructure within the IVA research team for several other projects, most notably Scene-maps, FMH, HPM, SymCity, AGM, DRVQ and VIRaL. It provides a common interface to frequently used data structures and a number of individual software components to support common tasks. Such tasks include local feature detection and descriptor computation, nearest neighbor search and clustering, aggregated representations like histograms and sparse sets used e.g. for bag-of-words and related models, matching methods including pyramid matching, algorithms like radix sort, set operations like intersection and unique element count, inverted file structures for indexing, as well as dataset organization and evaluation protocols.

Most software is using ivl, which has evolved itself to support the needs of the software. In many cases OpenCV is also required, but otherwise dependencies are kept to a minimum and constrained to individual components. The software includes dozens of individual components and hundreds of source files. Each component is typically accompanied by a sample project in Linux and Windows, demonstrating its use. The code is not public.

ivl

K. Kontosis, N. Skalkotos, S. Nathanail (advised by Y. Avrithis)

ivl

C++ GNU LGPL2/3 license 2007-2013

2007: S. Nathanail
2008: N. Skalkotos
2009-2013: K. Kontosis

ivl-lina: N. Skalkotos Linear algebra (LAPACK) 2008
ivl-cv: K. Kontosis Computer vision (OpenCV) 2009-2010
ivl-qt: K. Kontosis GUI (Qt) 2011-2012

ivl a full-header template C++98 general purpose library with convenient and powerful syntax. It extends C++ syntax towards mathematical notation, while making use of language features like classes, functions, operators, templates and type safety. It allows simple and expressive statements, while taking care of the underlying representation and optimization. Often resembling a new language, it targets abstract, concise, readable, yet efficient code. It supports the principle that the path from theory through rapid prototyping to production quality software should be as short as possible. In fact, the actual code should not differ much from pseudocode.

ivl features static and dynamic arrays, ranges, tuples, matrices, images and function objects supporting multiple return arguments, left/right overloading, function pipelining and vectorization, expression templates, automatic lazy evaluation, and dynamic multi-threading. Other features include sub-arrays and other lazy views of one- or multi-dimensional arrays and tuples, STL-compatible and multidimensional iterators, and extended compound operators. It is easy to use, with most syntax being self explanatory. It is fully optimized, with minimal or no runtime overhead, no temporaries or copies, and with most expressions boiling down to a single for loop.

ivl core is a header-only library, with no need for separate linking. It is fully template, supporting user-defined types. Separate modules are available that smoothly integrate with LAPACK, OpenCV and Qt for linear algebra, computer vision and GUI respectively. In each case, ivl shares its data representation with the underlying external library and combines its convenient syntax with a rich collection of software. Separate linking is needed for the modules used, since external libraries are not template.

The library is available as open source under a dual LGPL3.0 and GPL2.0 license at SourceForge and at its dedicated web site, which includes extended examples and documentation. A unique article ivl by example explains in less than eight pages how to build a randomized decision forest classifier from scratch with ivl, including the complete code of just 120 lines. The article and code behave like one entity, as in literate programming.

Over the years, ivl has been influenced by several C++ numerical libraries, for instance Eigen, or Boost.Multi-Array and Boost.Tuple for data representation and manipulation. At a more foundational level, it includes its own template metaprogramming library similar to Boost.MPL, heavily used for code optimization. A great motivation has been the Matlab language syntax, and in this sense a related project is Armadillo. Most of this syntax is supported, without the computational overhead and other known issues. In fact, ivl provides a unique integration of all the above functionalities.

Adaptive Manifold for Imbalanced Transductive Few-Shot Learning

M. Lazarou (advised by T. Stathaki, Y. Avrithis)

AM

PyTorch based on α-ΤΙΜ, ΤΙΜ, iLPC, S2M2_fewshot MIT license published in WACV 2024 2024

Adaptive Manifold is an algorithm for transductive few-shot learning on class-imbalanced data. It exploits the underlying manifold of the labeled examples and unlabeled queries by using manifold similarity to predict the class probability distribution of every query. It is parameterized by one centroid per class and a set of manifold parameters that determine the manifold. All parameters are optimized by minimizing a loss function that can be tuned towards class-balanced or imbalanced distributions. The code allows the reproduction of the results of our WACV 2024 paper.

Adaptive anchor label propagation

M. Lazarou (advised by T. Stathaki, Y. Avrithis)

A2LP

PyTorch based on iLPC, LR+ICI, S2M2_fewshot MIT license published in ICIP 2023 2023

In the context of transductive inference for few-shot learning, label propagation infers pseudo-labels for unlabeled data by using a graph that exploits the manifold structure of the data. Adaptive anchor label propagation (A2LP) is an algorithm that adapts the feature embeddings of the labeled data by minimizing a differentiable loss function, optimizing their positions in the manifold in the process. The code allows the reproduction of the results of our ICIP 2023 paper.

Iterative label propagation and cleaning

M. Lazarou (advised by T. Stathaki, Y. Avrithis)

iLPC

19

PyTorch based on S2M2, PT-MAP, LR+ICI, CloserLook, MCT MIT license published in ICCV 2021 2021

This is an algorithm for transductive and semi-supervised few-shot learning. It leverages the manifold structure of the labeled and unlabeled data distribution to predict pseudo-labels, while balancing over classes and using the loss value distribution of a limited-capacity classifier to select the cleanest labels, iteratively improving the quality of pseudo-labels. The code allows the reproduction of the results of our ICCV 2021 paper.

Tensor feature hallucination

M. Lazarou (advised by T. Stathaki, Y. Avrithis)

TFH

5

PyTorch based on RFS, IDeMe-Net, Dual TriNet, S2M2, DeepEMD MIT license published in WACV 2022 2022

This is a simple synthetic data generation method few-shot learning. It involves a simple loss function for training a feature generator and it learns to generate tensor features instead of vector features. The code allows the reproduction of the results of our WACV 2022 paper.

Few-shot few-shot learning

Y. Lifchitz (advised by S. Picard, Y. Avrithis)

FSFSL

PyTorch published in ICPR 2020 2019

We depart from the standard setting of few-shot learning in that the representation is obtained from a classifier pre-trained on a large-scale dataset of a different domain, while the base class data are limited to few examples per class and their role is to adapt the representation to the domain at hand rather than learn from scratch. In doing so, we obtain from the pre-trained classifier a spatial attention map that allows focusing on objects and suppressing background clutter. The code is not public.

Dense classification and implanting

Y. Lifchitz (advised by A. Bursuc, S. Picard, Y. Avrithis)

DCI

PyTorch based on FSwF published in CVPR 2019 2018

Dense classification over feature maps studies for the first time local activations in the domain of few-shot learning. Implanting, that is, attaching new neurons to a previously trained network to learn new, task-specific features, achieves for the first time fine-tuning of the entire network to convergence without overfitting on novel classes. The code is not public.

Locally optimized product quantization (Production version)

C. Melina, Y. Kalantidis

LOPQ-prod

560

Python, Spark based on LOPQ Apache-2.0 license published in CVPR 2014 2016-2017

This is a Python/Spark implementation of LOPQ. On top of CNN features, it has been used to power image similarity search on the entire Flickr collection.

Visual Image Retrieval and Localization

Y. Kalantidis, G. Tolias, M. Phinikettos, E. Spyrou, Ph. Mylonas (advised by Y. Avrithis)

VIRaL

based on Flickr published in ACM-MM 2010, MTAP 2011 2008-2012

Application interface: M. Phinikettos PHP, Javascript 2008-2012
Core search engine: Y. Kalantidis, G. Tolias C++ 2008-2012
Explore/Routes: Y. Kalantidis, G. Tolias C++, PHP, Javascript based on Scene-maps 2011

VIRaL is a visual search engine available online since 2008. The query is an image, either uploaded, fetched from a given URL, or chosen from the its database. Given this single image, it retrieves visually similar images and estimates its location on the map. It also suggests tags that may be attached to the query image, identifies known landmarks or points of interest, and provides links to relevant Wikipedia articles. Its database contains 2.7M Flickr images from 43 cities in the world. It is able to recognize tens of thousands of landmarks.

Additional applications enhance its user experience. VIRaL Explore enables browsing of the entire VIRaL image collection on the world map. Starting in a given city or at any zoom level on the map, it places icons corresponding to grouped photos, along with landmark names and Wikipedia links, if applicable. Photos are grouped off-line according to whether they depict the same object, building, or scene, and most popular groups are shown on the map, according to zoom level. VIRaL Routes offers a unique browsing experience of personal photo collections. Collections are processed off-line to identify where they were taken and group them by scene; a route is then constructed on the map, showing icons of visited places.

VIRaL targets general public to demonstrate results of our research. It has been disseminated in several technical and wide-audience venues. It is a unique application, and one of the very few non-commercial CBIR engines listed by Wikipedia that is really operating online.

ivl

K. Kontosis, N. Skalkotos, S. Nathanail (advised by Y. Avrithis)

ivl

C++ GNU LGPL2/3 license 2007-2013

2007: S. Nathanail
2008: N. Skalkotos
2009-2013: K. Kontosis

ivl-lina: N. Skalkotos Linear algebra (LAPACK) 2008
ivl-cv: K. Kontosis Computer vision (OpenCV) 2009-2010
ivl-qt: K. Kontosis GUI (Qt) 2011-2012

ivl a full-header template C++98 general purpose library with convenient and powerful syntax. It extends C++ syntax towards mathematical notation, while making use of language features like classes, functions, operators, templates and type safety. It allows simple and expressive statements, while taking care of the underlying representation and optimization. Often resembling a new language, it targets abstract, concise, readable, yet efficient code. It supports the principle that the path from theory through rapid prototyping to production quality software should be as short as possible. In fact, the actual code should not differ much from pseudocode.

ivl features static and dynamic arrays, ranges, tuples, matrices, images and function objects supporting multiple return arguments, left/right overloading, function pipelining and vectorization, expression templates, automatic lazy evaluation, and dynamic multi-threading. Other features include sub-arrays and other lazy views of one- or multi-dimensional arrays and tuples, STL-compatible and multidimensional iterators, and extended compound operators. It is easy to use, with most syntax being self explanatory. It is fully optimized, with minimal or no runtime overhead, no temporaries or copies, and with most expressions boiling down to a single for loop.

ivl core is a header-only library, with no need for separate linking. It is fully template, supporting user-defined types. Separate modules are available that smoothly integrate with LAPACK, OpenCV and Qt for linear algebra, computer vision and GUI respectively. In each case, ivl shares its data representation with the underlying external library and combines its convenient syntax with a rich collection of software. Separate linking is needed for the modules used, since external libraries are not template.

The library is available as open source under a dual LGPL3.0 and GPL2.0 license at SourceForge and at its dedicated web site, which includes extended examples and documentation. A unique article ivl by example explains in less than eight pages how to build a randomized decision forest classifier from scratch with ivl, including the complete code of just 120 lines. The article and code behave like one entity, as in literate programming.

Over the years, ivl has been influenced by several C++ numerical libraries, for instance Eigen, or Boost.Multi-Array and Boost.Tuple for data representation and manipulation. At a more foundational level, it includes its own template metaprogramming library similar to Boost.MPL, heavily used for code optimization. A great motivation has been the Matlab language syntax, and in this sense a related project is Armadillo. Most of this syntax is supported, without the computational overhead and other known issues. In fact, ivl provides a unique integration of all the above functionalities.

Neural architecture growing, pruning and search

T. Neitthoffer (advised by Y. Avrithis)

NAGP

PyTorch based on SDN, SNIP 2020

The goal of neural architecture search is to automatically find the optimal network architecture, that is, the optimal succession and interconnection of layers. This is an intractable combinatorial optimization problem. We define a fully-dense super-network, out of which we select the most important connections by pruning. Still, training a deep super-network is not practical, so we devise a greedy algorithm: We grow the super-network a few layers at a time, training it and pruning its connections at each iteration. The code allows the reproduction of the results of the 2020 MSc thesis.

Visual Image Retrieval and Localization

Y. Kalantidis, G. Tolias, M. Phinikettos, E. Spyrou, Ph. Mylonas (advised by Y. Avrithis)

VIRaL

based on Flickr published in ACM-MM 2010, MTAP 2011 2008-2012

Application interface: M. Phinikettos PHP, Javascript 2008-2012
Core search engine: Y. Kalantidis, G. Tolias C++ 2008-2012
Explore/Routes: Y. Kalantidis, G. Tolias C++, PHP, Javascript based on Scene-maps 2011

VIRaL is a visual search engine available online since 2008. The query is an image, either uploaded, fetched from a given URL, or chosen from the its database. Given this single image, it retrieves visually similar images and estimates its location on the map. It also suggests tags that may be attached to the query image, identifies known landmarks or points of interest, and provides links to relevant Wikipedia articles. Its database contains 2.7M Flickr images from 43 cities in the world. It is able to recognize tens of thousands of landmarks.

Additional applications enhance its user experience. VIRaL Explore enables browsing of the entire VIRaL image collection on the world map. Starting in a given city or at any zoom level on the map, it places icons corresponding to grouped photos, along with landmark names and Wikipedia links, if applicable. Photos are grouped off-line according to whether they depict the same object, building, or scene, and most popular groups are shown on the map, according to zoom level. VIRaL Routes offers a unique browsing experience of personal photo collections. Collections are processed off-line to identify where they were taken and group them by scene; a route is then constructed on the map, showing icons of visited places.

VIRaL targets general public to demonstrate results of our research. It has been disseminated in several technical and wide-audience venues. It is a unique application, and one of the very few non-commercial CBIR engines listed by Wikipedia that is really operating online.

Simple attention-based pooling

B. Psomas, I. Kakogeorgiou (advised by K. Karantzalos, Y. Avrithis)

SimPool

90

PyTorch based on AttMask, DINO, ConvNeXt, DETR, timm, Metrix Apache-2.0 license published in ICCV 2023 2023

SimPool is a simple attention-based pooling mechanism as a replacement of the default one for both convolutional and transformer encoders. Whether supervised or self-supervised, it improves performance on pre-training and downstream tasks and provides attention maps delineating object boundaries in all cases. SimPool is the first method to obtain attention maps in supervised transformers of at least as good quality as self-supervised, without explicit losses or modifying the architecture. The code allows the reproduction of the results of our ICCV 2023 paper.

Mixup for Deep Metric Learning

B. Psomas, S. Venkataramanan (advised by E. Kijak, L. Amsaleg, K. Karantzalos, Y. Avrithis)

Metrix

3

PyTorch based on Proxy Anchor, PyTorch Metric Learning, DML Benchmark published in ICLR 2022 2022

Metric Mix, or Metrix, is an algorithm for mixup-based interpolation as a data augmentation method for metric learning. It uses a generalized formulation that encompasses existing metric learning loss functions that is modified to accommodate for mixup. Mixing takes place at the input space, intermediate representations as well as the embedding space. It refers to both examples and target labels. The code allows the reproduction of the results of our ICLR 2022 paper.

Attention-guided masked image modeling

I. Kakogeorgiou, B. Psomas (advised by S. Gidaris, Y. Avrithis, K. Karantzalos, N. Komodakis)

AttMask

56

PyTorch based on iBOT, DINO, BEiT, ImageNet-9 Apache-2.0 license published in ECCV 2022 2022

In the context of self-supervised pretraining of vision transformers, this is a masking strategy that can be used as an alternative to random masking for dense distillation-based masked image modeliing (MIM) as well as plain distillation-based self-supervised learning on classification tokens. In particular, in the distillation-based setting, a teacher transformer encoder generates an attention map, which we use to guide masking for the student. The code allows the reproduction of the results of our ECCV 2022 paper.

Flickr Logos 27

Y. Kalantidis, L.G. Pueyo, M. Trevisiol (advised by R. van Zwol, Y. Avrithis)

Logos27

based on Flickr Identity + Logo Design published in ICMR 2011 2011

This is an annotated logo dataset downloaded from Flickr group Identity + Logo Design and contains more than 4000 logo classes/brands in total. It consists of a training, a distractor and a query set, containing respectively 810 images with bounding boxes labeled into 27 classes, 4207 logo images/classes depicting clean logos and 270 images, half of which are annotated into 27 training classes and the other half do not depict logos.

Revisiting Oxford and Paris

F. Radenovic, A. Iscen (advised by G. Tolias, O. Chum, Y. Avrithis)

RevOP

240

Matlab, Python based on Oxford5k, Paris6k published in CVPR 2018 2017

RevOP is an image retrieval benchmark. It is the result of revisiting the two most popular image retrieval datasets, Oxford5k and Paris6k. We provide new annotation for both datasets with an extra attention to the reliability of the ground truth. All co-authors have independently annotated the entire dataset; the final annotation is the result of merging all individual contributions with an automated voting process. We introduce 15 new, more difficult queries per dataset and update the evaluation protocol by introducing three new settings of varying difficulty. We also create a new set of one million challenging distractors. The package includes Matlab and Python code to download and process the data and evaluate results on the new benchmark.

Spatiotemporal feature detector

K. Rapantzikos (advised by Y. Avrithis)

SFD

Matlab published in CVPR 2009, CC 2011 2008-2009

This is a local feature detector originally applied to action recognition and then to salient event detection and movie summarization. It uses a multi-scale volumetric representation of the video and involves spatiotemporal operations at the voxel level. Saliency is computed by a global minimization process constrained by pure volumetric constraints, each of them being related to an informative visual aspect, namely spatial proximity, scale and feature similarity (intensity, color, motion). Points are selected as the extrema of the saliency response and prove to balance well between density and informativeness. The code is not public.

$k$-d Generalized Randomized Forests

G. Samaras (advised by I. Emiris, Y. Avrithis)

GeRaF

10

C++ header-only BSD license published in CGI 2016 2015-2016

$k$-d GeRaF is a data structure and algorithm for approximate nearest neighbor search in high dimensions. It improves randomized forests by introducing new randomization techniques to specify a set of independently constructed trees where search is performed simultaneously, hence increasing accuracy. We omit backtracking, and we optimize distance computations.

Early burst detection (Demo version)

M. Shi (advised by H. Jégou, Y. Avrithis)

EBD

Matlab based on ASMK published in CVPR 2015 2014

EBD is a compact representation for image retrieval. It explicitly detects visual bursts in an image at an early stage, using clustering in the descriptor space. The bursty groups are merged into meta-features, which are used as input to image search systems. It achieves compressing image representations by more than 90% without significant loss in performance. This is a demo version, available upon request.

Part learning for visual recognition

R. Sicre (advised by T. Furon, E. Kijak, F. Jurie, Y. Avrithis)

Parts

Matlab published in CVPR 2017, CEFRL/ICCV 2017 2016-2017

In image classification, it has been common to learn mid-level discriminative parts, even before deep learning. Discovery of discriminative parts casts this as a quadratic assignment problem, allowing the use of a number of optimization algorithms on top of CNN representations. Unsupervised part learning extends this work by dispensing the need for class labels during part learning. It is applied equally to classification and instance retrieval, bringing significant gains to both. The code is not public.

Semi-supervised active learning

M. Budnik, O. Siméoni (advised by Y. Avrithis)

SSAL

15

PyTorch based on DeepCluster, DLP MIT license published in ICPR 2020 2019-2020

This is a deep active learning framework allowing a systematic evaluation of different acquisition functions with or without methods that make use of the unlabeled data during model training. In particular, this includes (i) unsupervised pre-training, as implemented by DeepCluster, and (ii) semi-supervised learning, as implemented by our deep label propagation. The code allows the reproduction of the results of our ICPR 2020 paper.

Deep spatial matching

O. Siméoni (advised by O. Chum, Y. Avrithis)

DSM

MatConvNet, C++ based on CIR MIT license published in CVPR 2019 2018-2019

DSM exploits the sparsity of convolutional activations to detect local features and provide spatial matching for image retrieval. Without modifying the network architecture or re-training, without even local descriptors or vocabularies, deep spatial matching sets a new state of the art in particular object retrieval with a compact representation. The code allows the reproduction of the results of our CVPR 2019 paper.

Graph-based object discovery

O. Siméoni, A. Iscen (advised by G. Tolias, O. Chum, Y. Avrithis)

GOD

MatConvNet based on Diffusion, AGM published in WACV 2018, MVA 2019 2017-2018

GOD captures discriminative patterns from regional CNN activations of an entire dataset, suppressing background clutter. A saliency measure is defined, based on a centrality measure of a nearest neighbor graph constructed from regional CNN representations of dataset images. Salient regions are then detected using an extended version of expanding Gaussian mixture. The code is not public yet.

ivl

K. Kontosis, N. Skalkotos, S. Nathanail (advised by Y. Avrithis)

ivl

C++ GNU LGPL2/3 license 2007-2013

2007: S. Nathanail
2008: N. Skalkotos
2009-2013: K. Kontosis

ivl-lina: N. Skalkotos Linear algebra (LAPACK) 2008
ivl-cv: K. Kontosis Computer vision (OpenCV) 2009-2010
ivl-qt: K. Kontosis GUI (Qt) 2011-2012

ivl a full-header template C++98 general purpose library with convenient and powerful syntax. It extends C++ syntax towards mathematical notation, while making use of language features like classes, functions, operators, templates and type safety. It allows simple and expressive statements, while taking care of the underlying representation and optimization. Often resembling a new language, it targets abstract, concise, readable, yet efficient code. It supports the principle that the path from theory through rapid prototyping to production quality software should be as short as possible. In fact, the actual code should not differ much from pseudocode.

ivl features static and dynamic arrays, ranges, tuples, matrices, images and function objects supporting multiple return arguments, left/right overloading, function pipelining and vectorization, expression templates, automatic lazy evaluation, and dynamic multi-threading. Other features include sub-arrays and other lazy views of one- or multi-dimensional arrays and tuples, STL-compatible and multidimensional iterators, and extended compound operators. It is easy to use, with most syntax being self explanatory. It is fully optimized, with minimal or no runtime overhead, no temporaries or copies, and with most expressions boiling down to a single for loop.

ivl core is a header-only library, with no need for separate linking. It is fully template, supporting user-defined types. Separate modules are available that smoothly integrate with LAPACK, OpenCV and Qt for linear algebra, computer vision and GUI respectively. In each case, ivl shares its data representation with the underlying external library and combines its convenient syntax with a rich collection of software. Separate linking is needed for the modules used, since external libraries are not template.

The library is available as open source under a dual LGPL3.0 and GPL2.0 license at SourceForge and at its dedicated web site, which includes extended examples and documentation. A unique article ivl by example explains in less than eight pages how to build a randomized decision forest classifier from scratch with ivl, including the complete code of just 120 lines. The article and code behave like one entity, as in literate programming.

Over the years, ivl has been influenced by several C++ numerical libraries, for instance Eigen, or Boost.Multi-Array and Boost.Tuple for data representation and manipulation. At a more foundational level, it includes its own template metaprogramming library similar to Boost.MPL, heavily used for code optimization. A great motivation has been the Matlab language syntax, and in this sense a related project is Armadillo. Most of this syntax is supported, without the computational overhead and other known issues. In fact, ivl provides a unique integration of all the above functionalities.

Revisiting Google Landmark Dataset v2 Clean

C.H. Song (advised by Y.H. Gu, Y. Avrithis)

RGLDv2-clean

2

MIT license published in CVPR 2024 2024

How important is it for training and evaluation sets to not have class overlap in image retrieval? We revisit Google Landmarks v2 clean, the most popular training set, by identifying and removing class overlap with Revisited Oxford and Paris, the most popular evaluation set. By comparing the original and the new RGLDv2-clean on a benchmark of reproduced state-of-the-art methods, our findings are striking. Not only is there a dramatic drop in performance, but it is inconsistent across methods, changing the ranking.

Scene maps

Y. Kalantidis, E. Spyrou, G. Tolias (advised by Y. Avrithis)

Scene-maps

C++ based on ivl, LPSolve published in ACM-MM 2010, MTAP 2011 2010-2011

Scene maps refers to a representation of image collections used for large scale image search and mining, and applied to location and landmark recognition. Starting from a geo-tagged dataset, we first group images geographically and then visually, where each visual cluster is assumed to depict different views of the the same scene. We align all views to one reference image and construct a 2D scene map by preserving details from all images while discarding repeating visual features. A scene map thus collectively represents a scene as seen from different viewpoints. The indexing, retrieval and spatial matching scheme then operates directly on scene maps. All clustering operations are based on kernel vector quantization (KVQ). The code is not public.

Visual Image Retrieval and Localization

Y. Kalantidis, G. Tolias, M. Phinikettos, E. Spyrou, Ph. Mylonas (advised by Y. Avrithis)

VIRaL

based on Flickr published in ACM-MM 2010, MTAP 2011 2008-2012

Application interface: M. Phinikettos PHP, Javascript 2008-2012
Core search engine: Y. Kalantidis, G. Tolias C++ 2008-2012
Explore/Routes: Y. Kalantidis, G. Tolias C++, PHP, Javascript based on Scene-maps 2011

VIRaL is a visual search engine available online since 2008. The query is an image, either uploaded, fetched from a given URL, or chosen from the its database. Given this single image, it retrieves visually similar images and estimates its location on the map. It also suggests tags that may be attached to the query image, identifies known landmarks or points of interest, and provides links to relevant Wikipedia articles. Its database contains 2.7M Flickr images from 43 cities in the world. It is able to recognize tens of thousands of landmarks.

Additional applications enhance its user experience. VIRaL Explore enables browsing of the entire VIRaL image collection on the world map. Starting in a given city or at any zoom level on the map, it places icons corresponding to grouped photos, along with landmark names and Wikipedia links, if applicable. Photos are grouped off-line according to whether they depict the same object, building, or scene, and most popular groups are shown on the map, according to zoom level. VIRaL Routes offers a unique browsing experience of personal photo collections. Collections are processed off-line to identify where they were taken and group them by scene; a route is then constructed on the map, showing icons of visited places.

VIRaL targets general public to demonstrate results of our research. It has been disseminated in several technical and wide-audience venues. It is a unique application, and one of the very few non-commercial CBIR engines listed by Wikipedia that is really operating online.

Part-aware editable 3D shape generation

K. Tertikas (advised by D. Paschalidou, J.J. Park, Y. Avrithis)

PartNeRF

42

PyTorch published in CVPR 2023 2023

PartNeRF is a part-aware generative model for editable 3D shape synthesis. It does not require explicit 3D or part supervision and is able to produce textures. It generates objects as a set of locally defined NeRFs, augmented with an affine transformation. This enables editing operations such as applying transformations on parts, mixing parts from different objects etc. The color of each ray is only determined by a single NeRF. As a result, altering one part does not affect the appearance of the others. The code allows the reproduction of the results of our CVPR 2023 paper.

Graph convolutional cleaning

A. Iscen, G. Tolias (advised by O. Chum, C. Schmid, Y. Avrithis)

GCC

23

PyTorch based on GCN, FAISS Apache-2.0 license published in ECCV 2020 2019-2020

We learn a classifier from few clean and many noisy labels. The structure of clean and noisy data is modeled by a graph per class and graph convolutional networks are used to predict class relevance of noisy examples. This cleaning method is evaluated on an extended version of a few-shot learning problem, where the few clean examples of novel classes are supplemented with additional noisy data. The code allows the reproduction of the results of our ECCV 2020 paper.

Mining on manifolds

A. Iscen, G. Tolias (advised by O. Chum, Y. Avrithis)

MoM

34

PyTorch, MatConvNet based on MLbench, DLP published in CVPR 2018 2017-2018

MoM is one of the very few self-supervised metric learning methods. Building on findings of manifold similarity, it learns a representation space where Euclidean neighbors are determined according to manifold neighbors in the original feature space. It is applied to fine-grained classification as well as particular object retrieval.

Diffusion for image retrieval

A. Iscen, G. Tolias (advised by O. Chum, T. Furon, Y. Avrithis)

Diffusion

87

Matlab based on Yael GNU GPL3+ license published in CVPR 2017 2016-2017

Diffusion is a manifold search method that uses a random walk on the nearest neighbor graph of a dataset. It has been extended to a spectral approach and a hybrid variant of the two for image retrieval. The code and data allows the reproduction of the results of our CVPR 2017 paper. In particular, we provide the descriptors used and the necessary ground-truth files for mAP evaluation. We also make available the approximate $k$-NN graph computed off-line for large-scale datasets.

Aggregated selective match kernel (Python version)

T. Jenicek, G. Tolias (advised by O. Chum)

ASMK-py

67

Python based on ASMK, CIR-torch, FAISS MIT license published in ICCV 2013, HOW 2020

This is a Python implementation of ASMK. There are minor differences compared to the original ASMK method (ICCV 2013) and Matlab implementation, which are described in the HOW paper (ECCV 2020).

Aggregated selective match kernel (Matlab version)

G. Tolias (advised by H. Jégou, Y. Avrithis)

ASMK

22

Matlab based on Yael published in ICCV 2013, IJCV 2016 2013

ASMK is a method for image search using local features and a combination of inverted files with compact binary descriptors. This model encompasses as special cases aggregated representations like VLAD and matching techniques such as Hamming Embedding. Making the bridge between these approaches, it takes the best of existing methods by combining an aggregation procedure with a selective match kernel. It has been a state of the art method before deep learning and it also applies to CNN features. The code allows the reproduction of the results of our ICCV 2013 paper as well as part of the experiments of revisited Oxford and Paris.

Feature Selection by Symmetry

G. Tolias, Y. Kalantidis (advised by Y. Avrithis)

SymCity

C++ based on HPM, ivl published in ACM-MM 2012 2012

To reduce the space required for the index in large scale search, several methods focus on feature selection based on multiple views. In practice however, most images are unique, in the sense that they depict a unique view of an object or scene in the dataset and there is nothing to compare to. SymCity selects features in such unique images by self-similarity. In effect, we detect repeating patterns or local symmetries and select the participating features. The method itself is a variant of HPM, called Hough pyramid self-matching (HPSM) and maintains the same retrieval performance using only 20% of the required memory. The code is not public.

World Cities 2M

G. Tolias (advised by Y. Avrithis)

WC2M

based on Flickr published in ICCV 2011 2011

WC2M Consists of 2.2M geo-tagged images from 40 cities, crawled from Flickr using geographic queries covering a window of each city center. It is meant to be used as a distractor set along with any annotated test set for image retrieval. It also includes the test set of EC1M dataset and is a superset of both EC1M and EC50k. The dataset is challenging because both the test set and the distractors mostly depict urban scenery.

Hough pyramid matching (Public version)

G. Tolias (advised by Y. Avrithis)

HPM

C++ based on ivl published in ICCV 2011, IJCV 2014 2010-2011

HPM is a spatial matching method applied to geometry re-ranking for large scale search. It is based on a relaxed spatial matching model, which applies pyramid matching to the Hough transformation space. It is invariant to similarity transformations and free of inlier-count verification. It imposes one-to-one mapping and is flexible, allowing non-rigid motion and multiple matching surfaces or objects. It is linear in the number of correspondences and extremely fast in practice.

European Cities 1M

Y. Kalantidis, G. Tolias (advised by Y. Avrithis)

EC1M

based on Flickr published in ACM-MM 2010 2010

EC1M Consists of 909k geo-tagged images from 22 European cities, crawled from Flickr using geographic queries covering a window of each city center. A subset of 1,081 images from Barcelona is annotated into 35 groups depicting the same scene; 17 of the groups are landmark scenes and 18 are non-landmark. Annotation is based respectively on tags and visual search / manual clean-up. In total, 157 of those images are defined as queries (up to 5 per group). Images of the remaining 21 cities are used as distractors. Most depict urban scenery like the ground-truth, making a challenging distractor dataset.

Scene maps

Y. Kalantidis, E. Spyrou, G. Tolias (advised by Y. Avrithis)

Scene-maps

C++ based on ivl, LPSolve published in ACM-MM 2010, MTAP 2011 2010-2011

Scene maps refers to a representation of image collections used for large scale image search and mining, and applied to location and landmark recognition. Starting from a geo-tagged dataset, we first group images geographically and then visually, where each visual cluster is assumed to depict different views of the the same scene. We align all views to one reference image and construct a 2D scene map by preserving details from all images while discarding repeating visual features. A scene map thus collectively represents a scene as seen from different viewpoints. The indexing, retrieval and spatial matching scheme then operates directly on scene maps. All clustering operations are based on kernel vector quantization (KVQ). The code is not public.

European Cities 50k

G. Tolias, Y. Kalantidis (advised by Y. Avrithis)

EC50k

based on Flickr published in ACM-MM 2010 2010

EC50k consists of 50,767 geo-tagged images from 14 European cities, crawled from Flickr using geographic queries covering a window of each city center. A subset of 778 images from 9 cities are annotated into 20 groups depicting the same scene. Annotation is based on tags and visual search / manual clean-up. In total, 100 of those images are defined as queries (5 per group). Images of the remaining 5 cities are used as distractors. Most depict urban scenery like the ground-truth, making a challenging distractor dataset.

Feature map hashing/similarity (Production version)

G. Tolias, Y. Kalantidis (advised by Y. Avrithis)

FMH-prod

C++ based on ivl, FMH published in ACM-MM 2010, CVIU 2014 2009-2012

This is the production version of FMH. The code is not public.

Visual Image Retrieval and Localization

Y. Kalantidis, G. Tolias, M. Phinikettos, E. Spyrou, Ph. Mylonas (advised by Y. Avrithis)

VIRaL

based on Flickr published in ACM-MM 2010, MTAP 2011 2008-2012

Application interface: M. Phinikettos PHP, Javascript 2008-2012
Core search engine: Y. Kalantidis, G. Tolias C++ 2008-2012
Explore/Routes: Y. Kalantidis, G. Tolias C++, PHP, Javascript based on Scene-maps 2011

VIRaL is a visual search engine available online since 2008. The query is an image, either uploaded, fetched from a given URL, or chosen from the its database. Given this single image, it retrieves visually similar images and estimates its location on the map. It also suggests tags that may be attached to the query image, identifies known landmarks or points of interest, and provides links to relevant Wikipedia articles. Its database contains 2.7M Flickr images from 43 cities in the world. It is able to recognize tens of thousands of landmarks.

Additional applications enhance its user experience. VIRaL Explore enables browsing of the entire VIRaL image collection on the world map. Starting in a given city or at any zoom level on the map, it places icons corresponding to grouped photos, along with landmark names and Wikipedia links, if applicable. Photos are grouped off-line according to whether they depict the same object, building, or scene, and most popular groups are shown on the map, according to zoom level. VIRaL Routes offers a unique browsing experience of personal photo collections. Collections are processed off-line to identify where they were taken and group them by scene; a route is then constructed on the map, showing icons of visited places.

VIRaL targets general public to demonstrate results of our research. It has been disseminated in several technical and wide-audience venues. It is a unique application, and one of the very few non-commercial CBIR engines listed by Wikipedia that is really operating online.

IVA visual representation, matching and search infrastructure

G. Tolias, Y. Kalantidis (advised by Y. Avrithis)

iva

C++ based on ivl, OpenCV 2008-2014

This is a collection of software that has been used internally as infrastructure within the IVA research team for several other projects, most notably Scene-maps, FMH, HPM, SymCity, AGM, DRVQ and VIRaL. It provides a common interface to frequently used data structures and a number of individual software components to support common tasks. Such tasks include local feature detection and descriptor computation, nearest neighbor search and clustering, aggregated representations like histograms and sparse sets used e.g. for bag-of-words and related models, matching methods including pyramid matching, algorithms like radix sort, set operations like intersection and unique element count, inverted file structures for indexing, as well as dataset organization and evaluation protocols.

Most software is using ivl, which has evolved itself to support the needs of the software. In many cases OpenCV is also required, but otherwise dependencies are kept to a minimum and constrained to individual components. The software includes dozens of individual components and hundreds of source files. Each component is typically accompanied by a sample project in Linux and Windows, demonstrating its use. The code is not public.

Flickr Logos 27

Y. Kalantidis, L.G. Pueyo, M. Trevisiol (advised by R. van Zwol, Y. Avrithis)

Logos27

based on Flickr Identity + Logo Design published in ICMR 2011 2011

This is an annotated logo dataset downloaded from Flickr group Identity + Logo Design and contains more than 4000 logo classes/brands in total. It consists of a training, a distractor and a query set, containing respectively 810 images with bounding boxes labeled into 27 classes, 4207 logo images/classes depicting clean logos and 270 images, half of which are annotated into 27 training classes and the other half do not depict logos.

Weighted alpha-shapes

C. Varytimidis (advised by K. Rapantzikos, Y. Avrithis)

WaSH

C++ based on OpenCV, CGAL, Boost published in ECCV 2012, PR 2016 2011-2012

WaSH is a local feature detector. Given an input image, it computes a list of detected features, optionally with descriptors. It begins from sampled edges and is based on shape stability measures across the weighted $\alpha$-filtration, a computational geometry construction that captures the shape of a non-uniform set of points. Detected features are blob-like and include non-extremal regions as well as regions determined by cavities of boundary shape.

multi-object Discovery and tRAcking

S. Venkataramanan (advised by J. Carreira, Y.M. Asano, Y. Avrithis)

DoRA

26

PyTorch based on DINO published in ICLR 2024 2024

DoRA is a self-supervised image pretraining method tailored for learning from continuous videos. It leverages the attention from the [CLS] token of distinct heads in a vision transformer to identify and consistently track multiple objects within a given frame across temporal sequences. On these, a teacher-student distillation loss is then applied. Importantly, we do not use any off-the-shelf object tracker or optical flow network. This keeps our pipeline simple and does not require any additional data or training. The code allows the reproduction of the results of our ICLR 2024 paper.

Walking Tours

S. Venkataramanan (advised by J. Carreira, Y.M. Asano, Y. Avrithis)

WTours

published in ICLR 2024 2024

The Walking Tours dataset is a unique collection of long-duration egocentric videos captured in urban environments from cities in Europe and Asia. It consists of 10 high-resolution videos, each showcasing a person walking through a different environment, ranging from city centers to parks to residential areas, under different lighting conditions. A video from a Wildlife safari is also included to diversify the dataset with natural environments. The dataset is completely unlabeled and uncurated, making it suitable for self-supervised pretraining.

Mixup for Deep Metric Learning

B. Psomas, S. Venkataramanan (advised by E. Kijak, L. Amsaleg, K. Karantzalos, Y. Avrithis)

Metrix

3

PyTorch based on Proxy Anchor, PyTorch Metric Learning, DML Benchmark published in ICLR 2022 2022

Metric Mix, or Metrix, is an algorithm for mixup-based interpolation as a data augmentation method for metric learning. It uses a generalized formulation that encompasses existing metric learning loss functions that is modified to accommodate for mixup. Mixing takes place at the input space, intermediate representations as well as the embedding space. It refers to both examples and target labels. The code allows the reproduction of the results of our ICLR 2022 paper.

Aligned feature interpolation

S. Venkataramanan (advised by E. Kijak, L. Amsaleg, Y. Avrithis)

AlignMix

66

PyTorch based on WassDistance MIT license published in CVPR 2022 2022

This is a mixup-based data augmentation method, where we geometrically align two images in the feature space. The correspondences allow us to interpolate between two sets of features, while keeping the locations of one set. The code allows the reproduction of the results of our CVPR 2022 paper.

Multi-Target Unsupervised Domain Adaptation without External Data

Y. Xu (advised by P. Ghamisi, Y. Avrithis)

UT-KD

2

PyTorch based on AdaptSegNet, fast-neural-style, SE-GAN MIT license published in arXiv 2024 2024

We introduce a new strategy called for semantic segmentation "multi-target UDA without external data". The segmentation model is initially trained on the external data. Then, it is adapted to a new unseen target domain without accessing any external data. This approach is thus more scalable than existing solutions and remains applicable when external data is inaccessible. We demonstrate this strategy using a simple method, "unseen target knowledge distillation" (UT-KD), that incorporates self-distillation and adversarial learning, where knowledge acquired from the external data is preserved during adaptation through "one-way" adversarial learning. The code allows the reproduction of the results of our arXiv 2024 paper.

Nano-supervised object detection

Z. Yang (advised by M. Shi, Y. Avrithis)

NSOD

PyTorch based on WSDDN, PCL published in PR 2021 2019-2020

We learn an object detector from few weakly-labeled images and a larger set of completely unlabeled images. The main idea is to learn a classifier first in a semi-supervised setting, then use it as a teacher to train a student network on a weakly-supervised object detection task. The student detector is based on PCL weakly supervised object detector. The code is not public yet.

Boundary projection

H. Zhang (advised by T. Furon, L. Amsaleg, Y. Avrithis)

BP

7

TensorFlow based on CleverHans GNU GPL2+ license published in TIFS 2021 2019-2020

BP is an adversarial attack that reduces the distortion of the perturbation while operating under quantization at very few iterations. The attack is also used to build more robust models by using BP in adversarial training as a defense. The code allows the reproduction of the results of our TIFS 2021 paper.

Smooth adversarial examples

H. Zhang (advised by T. Furon, L. Amsaleg, Y. Avrithis)

SAE

6

TensorFlow, Matlab based on CleverHans, C&W GNU GPL2+ license published in JIS 2020 2018-2019

This is a particular form of photorealistic on-manifold adversarial examples that are actually more effective than ordinary off-manifold examples, despite the spatial constraints: our attack has the same probability of success at lower distortion. The perturbation is locally smooth on the flat areas of the input image, but it may be noisy on its textured areas and sharp across its edges. This operation relies on Laplacian smoothing, which we integrate in the attack pipeline. The code allows the reproduction of the results of our JIS 2020 paper.

A

Avrithis, Yannis

B

Budnik, Mateusz

E

Engin, Deniz

I

Iscen, Ahmet

J

Jenicek, Tomas

K

Kakogeorgiou, Ioannis

Kalantidis, Yannis

Kontosis, Kimon

L

Lazarou, Michalis

Lifchitz, Yann-Raphaël

M

Melina, Clayton

Mylonas, Phivos

N

Nathanail, Spyros

Neitthoffer, Timothée

P

Phinikettos, Marios

Psomas, Bill

Pueyo, Lluis Garcia

R

Radenovic, Filip

Rapantzikos, Konstantinos

S

Samaras, Georgios

Shi, Miaojing

Sicre, Ronan

Siméoni, Oriane

Skalkotos, Nikos

Song, Chull Hwan

Spyrou, Evaggelos

T

Tertikas, Konstantinos

Tolias, Giorgos

Trevisiol, Michele

V

Varytimidis, Christos

Venkataramanan, Shashanka

X

Xu, Yonghao

Y

Yang, Zhaohui

Z

Zhang, Hanwei