当前位置：网站首页>Label Embedding Online Hashing for Cross-Modal Retrieval

Label Embedding Online Hashing for Cross-Modal Retrieval

2022-07-21 22:00:00 【Schrodinger's fat stupid dog】

Label Embedding Online Hashing for Cross-Modal Retrieval

2020 MM

YongXin Wang ShanDong University

online hashing Research questions ：

How to effectively use semantic information
How to solve binary optimization problems discretely
How to update hash codes and hash functions efficiently

Summary

A tag embedding framework including tag similarity preservation and tag reconstruction is established , Different binary codes can be generated , It reduces the computational complexity
The algorithm not only retains the pairwise similarity of the input data , Moreover, the relationship between the new input data and the existing data is established through the internal active minimization on the block similarity matrix . Based on this , It can use more similar information , Make optimization less sensitive to incoming data , So as to produce effective binary code .
A discrete optimization algorithm is designed to solve the relaxation free binary optimization problem . therefore , Quantization error can be reduced .
Its computational complexity is only related to the size of the incoming data , This makes it very efficient .
Keep the hash code of existing data unchanged , Update hash function effectively , It is beneficial to expand to large-scale data sets .

Questions raised

Most online hashing methods are designed for multimodal retrieval , It is difficult to directly extend to cross modal retrieval . Although some online cross modal hash models have been proposed , But their performance is not satisfactory , Because the correlation between heterogeneous modes is difficult to capture .
They only update the hash function based on the newly arrived data , The association between the newly arrived data and the existing data is ignored , Information about existing data may be lost , Leading to unstable results .
The update scheme of the existing online hash method is inefficient . In these methods , Hash function can be effectively retrained by new data ; However , The hash code must be rebuilt for all accumulated data . therefore , The computational complexity depends on the size of the entire cumulative database , Making learning on large data sets very inefficient .
Discrete optimization is still an open problem of online hashing . Most methods use relaxation strategies , The quantization error is large . And in the online scene , The similarity between new data and existing data is unbalanced , That is, most pairs are not similar , Only a few pairs are similar . Several direct discrete optimizations in offline hash , That is, the discrete cycle coordinate drops (DCC) Because optimization depends heavily on different pairs and is no longer applicable .

Specific principle

Hash code learning

Definition ：
Insert picture description here
take S(t) Block , Yes

namely

Batch based hash target ：

Become the target of online hashing (B(old) unchanged ）：

As the accumulated data gradually increases , Lead to S(oc) and S(co) Sparse and unbalanced , That is, most elements are -1. Because the hard binary matrix decomposition may produce the deviation of retaining different information and losing similar information , Using direct discrete optimization method may cause large information loss , Therefore use V(t) Replacing a B(t), and B(t) be similar ,V(t) keep V(old) unchanged , to update V(coming). At the same time, in order to make V(t) Unbiased , The orthogonal constraint and equilibrium constraint are further introduced .
Insert picture description here

The optimization process

Optimize P
Optimize $\vec V$
The original objective function is about $\vec V$ The simplified form of is

Make

The resulting ：
Optimize R
Classical orthogonality procrustes problem , Solve by feature decomposition ：

* Optimize $\vec B$

Learning hash function

Objective function
Optimize W(m)
Generate hash code ：

experiment

Data set description ：

MIRFlicker
Include 25,000 An example , They are 24 Annotate at least one of the categories . One for each text 1386 The word bag vector of dimension represents , Each image uses 512 Dimensional GIST Eigenvector representation . The number of occurrences of deleted text marks is less than 20 After the second instance , Used 20015 An example . Random selection 2,000 Instances as query sets . To support online scenarios , The rest of the data is divided into 9 Data blocks , front 8 Data blocks, each containing 2,000 An example , The last data block contains 2,015 Yes . In every round , A new data block is added to the database .
IAPR TC-12
Including from all over the world 255 Category 20000 Combination of pictures and texts . Each image uses 512 dimension GIST Eigenvector representation , Use 2912 dimension bag-of-words Vector representation . Randomly selected 2000 Images - Text pairs as query sets . The rest is divided equally 9 Block , Each block contains 2000 Yes .
NUS-WIDE
contain 81 Category 269648 To image - The text is right , Image and text are respectively used 500 dimension SIFT Eigenvectors and 1000 Dimensional binary mark vector representation . Selected from the original data set 10 The most commonly used tags and the corresponding 186,577 A little bit . Random selection 2000 Pictures and their related texts are used as queries . The rest is divided into 18 block . among , front 17 Blocks, each containing 10000 spot , The last block contains 14577 spot .