Abstract: PostgreSQL video, pictures deduplication, application of digoal Date 2016-11-26 tags PostgreSQL image in the search business, Haar wavelet, image search, image de-emphasis, more business scenarios to heavy background video image processing, for example, image search, video de-emphasis, face recognition, Mito, pictures go ...
PostgreSQL In the video, the picture went heavy, applied in image search service
Author
digoal
date
2016-11-26
label
PostgreSQL, Haar wavelet, image search, image de-emphasis, de-emphasis video
background
Business scene more image processing, such as image search, video de-emphasis, face recognition, Mito, weight and other pictures go.
For example, video de-emphasis, some users upload more video, may have different versions of the same movie, not the same resolution, the track is not the same, the compression ratio is not the same. This condition causes the server to store large amounts of duplicate videos.
Another example is the yellow video or screening pornographic pictures, Jian Huang Shih-occupation to disappear.
Is there any way you can get duplicate video of it? How to identify the yellow video and pictures? This article will give you announced.
On the other hand, image search is the second text search and then a commonly used search engines.
Common market search engines Google, Baidu, Sogou image search engine.
For example, the upper layer of a snowman picture interface provided by the search engine, the search out of a pile and snowman approximate picture.
Image Search is how to do it?
Almighty PostgreSQL will not fall so fun stuff by PG universal API, you can extend its image search function.
If you are interested in developing extensions to PostgreSQL, refer to the article I wrote
"Find the G-spot business experience acid cool - PostgreSQL kernel extension Guide"
PostgreSQL image search plugin BACKGROUND
PostgreSQL image search plug-in uses a very mainstream after Haar wavelet transform technique for image storage, and can refer to WIKI article on the literature of HW.
https://en.wikipedia.org/wiki/Haar_wavelet
http://www.cs.toronto.edu/~kyros/courses/320/Lectures.2013s/lecture.2013s.10.pdf
https://wiki.postgresql.org/images/4/43/Pgcon_2013_similar_images.pdf
截取几页,注意烧脑。
PostgreSQL 图像搜索插件介绍
依赖gd.h
# yum install -y gd-devel
下载安装imgsmlr
-
$ git clone https://github.com/postgrespro/imgsmlr
-
$ cd imgsmlr
-
$ export PGHOME=/home/digoal/pgsql9.5
-
$ export PATH=$PGHOME/bin:$PATH:.
-
-
$ make USE_PGXS=1
-
$ make USE_PGXS=1 install
安装插件
-
$ psql
-
psql (9.5.3)
-
Type "help" for help.
-
postgres= # create extension imgsmlr;
-
CREATE EXTENSION
imgsmlr新增了两个数据类型
Datatype | Storage length | Description |
---|---|---|
pattern | 16388 bytes | Result of Haar wavelet transform on the image |
signature | 64 bytes | Short representation of pattern for fast search using GiST indexes |
gist 索引方法(支持pattern和signature类型), 以及KNN操作符,可以用于搜索相似度
Operator | Left type | Right type | Return type | Description |
---|---|---|---|---|
<-> | pattern | pattern | float8 | Eucledian distance between two patterns |
<-> | signature | signature | float8 | Eucledian distance between two signatures |
新增了几个函数
将图像的二进制转换为pattern类型,将pattern中存储的数据转换为signature类型
Function | Return type | Description |
---|---|---|
jpeg2pattern(bytea) | pattern | Convert jpeg image into pattern |
png2pattern(bytea) | pattern | Convert png image into pattern |
gif2pattern(bytea) | pattern | Convert gif image into pattern |
pattern2signature(pattern) | signature | Create signature from pattern |
shuffle_pattern(pattern) | pattern | Shuffle pattern for less sensitivity to image shift |
PostgreSQL 图像搜索插件测试
导入一些图片,例如(越多越好)
建立图片表
create table image (id serial, data bytea);
导入图片到数据库
insert into image(data) select pg_read_binary_file('文件路径');
将图片转换成 patten 和 signature
-
CREATE TABLE pat AS (
-
SELECT
-
id,
-
shuffle_pattern(pattern) AS pattern,
-
pattern2signature(pattern) AS signature
-
FROM (
-
SELECT
-
id,
-
jpeg2pattern( data) AS pattern
-
FROM
-
image
-
) x
-
);
创建索引
-
ALTER TABLE pat ADD PRIMARY KEY (id);
-
-
CREATE INDEX pat_signature_idx ON pat USING gist (signature);
近似度查询,例如查询与id = :id的图像相似的图像,按相似度排行,取出前10条
SELECT
id,
smlr
FROM
(
SELECT
id,
pattern <-> (SELECT pattern FROM pat WHERE id = :id) AS smlr FROM pat WHERE id <> :id ORDER BY signature <-> (SELECT signature FROM pat WHERE id = :id) LIMIT 100 ) x ORDER BY x.smlr ASC LIMIT 10
这里可以用到KNN索引,快速按相似度排行输出结果。
例子
视频去重业务
视频去重,可以抽取视频中的关键帧,自关联产生笛卡尔积,计算不同视频的任意两张图片的相似度,相似度达到一定阈值,可以认为是相同视频。
例子
-
创建图片表,并将所有视频的关键帧导入表中
-
create table image (id serial8 primary key, movie_id int, data bytea);
-
-
导入图片,假设为jpeg格式
-
... 略 ...
-
-
生成patten 和 signature
-
CREATE TABLE pat AS (
-
SELECT
-
id, movie_id,
-
shuffle_pattern(pattern) AS pattern,
-
pattern2signature(pattern) AS signature
-
FROM (
-
SELECT
-
id, movie_id,
-
jpeg2pattern( data) AS pattern
-
FROM
-
image
-
) x
-
);
-
-
计算不同视频的相似度
-
select t1.movie_id, t1.id, t1.signature<->t2.signature from
-
pat t1 join pat t2 on (t1.movie_id<>t2.movie_id)
-
order by t1.signature<->t2.signature desc
-
-
or
-
-
select t1.movie_id, t1.id, t1.signature<->t2.signature from
-
pat t1 join pat t2 on (t1.movie_id<>t2.movie_id)
-
where t1.signature<->t2.signature > 0.9
-
order by t1.signature<->t2.signature desc
summary
1. PostgreSQL is a very powerful database features highly customizable. And does not require moving to PostgreSQL kernel. Safe and reliable.
2. Use the image search technology is PostgreSQL extensions of example, the speed of Leverage, remember performance indicators about the location of the nearest neighbor query I've given you.
"PostgreSQL ten billion location data neighbor query milliseconds feedback."
3. If you are interested in developing extensions to PostgreSQL, refer to the article I wrote
"Find the G-spot business experience acid cool - PostgreSQL kernel extension Guide"