PostgreSQL In the video, the picture went heavy, applied in image search service

Abstract: PostgreSQL video, pictures deduplication, application of digoal Date 2016-11-26 tags PostgreSQL image in the search business, Haar wavelet, image search, image de-emphasis, more business scenarios to heavy background video image processing, for example, image search, video de-emphasis, face recognition, Mito, pictures go ...

PostgreSQL In the video, the picture went heavy, applied in image search service

Author

digoal

date

2016-11-26

label

PostgreSQL, Haar wavelet, image search, image de-emphasis, de-emphasis video


background

Business scene more image processing, such as image search, video de-emphasis, face recognition, Mito, weight and other pictures go.

For example, video de-emphasis, some users upload more video, may have different versions of the same movie, not the same resolution, the track is not the same, the compression ratio is not the same. This condition causes the server to store large amounts of duplicate videos.

Another example is the yellow video or screening pornographic pictures, Jian Huang Shih-occupation to disappear.

Is there any way you can get duplicate video of it? How to identify the yellow video and pictures? This article will give you announced.    

On the other hand, image search is the second text search and then a commonly used search engines.

Common market search engines Google, Baidu, Sogou image search engine.

http://image.baidu.com/

http://images.google.com.hk

For example, the upper layer of a snowman picture interface provided by the search engine, the search out of a pile and snowman approximate picture.

screenshot

Image Search is how to do it?

Almighty PostgreSQL will not fall so fun stuff by PG universal API, you can extend its image search function.

If you are interested in developing extensions to PostgreSQL, refer to the article I wrote

"Find the G-spot business experience acid cool - PostgreSQL kernel extension Guide"

PostgreSQL image search plugin BACKGROUND

PostgreSQL image search plug-in uses a very mainstream after Haar wavelet transform technique for image storage, and can refer to WIKI article on the literature of HW.

https://en.wikipedia.org/wiki/Haar_wavelet

http://www.cs.toronto.edu/~kyros/courses/320/Lectures.2013s/lecture.2013s.10.pdf

https://wiki.postgresql.org/images/4/43/Pgcon_2013_similar_images.pdf

截取几页,注意烧脑。

screenshot

screenshot

screenshot

screenshot

PostgreSQL 图像搜索插件介绍

依赖gd.h

# yum install -y gd-devel

下载安装imgsmlr

  1. $ git clone https://github.com/postgrespro/imgsmlr
  2. $ cd imgsmlr
  3. $ export PGHOME=/home/digoal/pgsql9.5
  4. $ export PATH=$PGHOME/bin:$PATH:.
  5.  
  6. $ make USE_PGXS=1
  7. $ make USE_PGXS=1 install

安装插件

  1. $ psql
  2. psql (9.5.3)
  3. Type "help" for help.
  4. postgres= # create extension imgsmlr;
  5. CREATE EXTENSION

imgsmlr新增了两个数据类型

Datatype Storage length Description
pattern 16388 bytes Result of Haar wavelet transform on the image
signature 64 bytes Short representation of pattern for fast search using GiST indexes

gist 索引方法(支持pattern和signature类型), 以及KNN操作符,可以用于搜索相似度

Operator Left type Right type Return type Description
<-> pattern pattern float8 Eucledian distance between two patterns
<-> signature signature float8 Eucledian distance between two signatures

新增了几个函数

将图像的二进制转换为pattern类型,将pattern中存储的数据转换为signature类型

Function Return type Description
jpeg2pattern(bytea) pattern Convert jpeg image into pattern
png2pattern(bytea) pattern Convert png image into pattern
gif2pattern(bytea) pattern Convert gif image into pattern
pattern2signature(pattern) signature Create signature from pattern
shuffle_pattern(pattern) pattern Shuffle pattern for less sensitivity to image shift

PostgreSQL 图像搜索插件测试

导入一些图片,例如(越多越好)

screenshot

建立图片表

create table image (id serial, data bytea); 

导入图片到数据库

insert into image(data) select pg_read_binary_file('文件路径'); 

将图片转换成 patten 和 signature

  1. CREATE TABLE pat AS (
  2. SELECT
  3. id,
  4. shuffle_pattern(pattern) AS pattern,
  5. pattern2signature(pattern) AS signature
  6. FROM (
  7. SELECT
  8. id,
  9. jpeg2pattern( data) AS pattern
  10. FROM
  11. image
  12. ) x
  13. );

创建索引

  1. ALTER TABLE pat ADD PRIMARY KEY (id);
  2.  
  3. CREATE INDEX pat_signature_idx ON pat USING gist (signature);

近似度查询,例如查询与id = :id的图像相似的图像,按相似度排行,取出前10条

SELECT
    id,
    smlr
FROM
(
    SELECT
        id,
        pattern <-> (SELECT pattern FROM pat WHERE id = :id) AS smlr FROM pat WHERE id <> :id ORDER BY signature <-> (SELECT signature FROM pat WHERE id = :id) LIMIT 100 ) x ORDER BY x.smlr ASC LIMIT 10

这里可以用到KNN索引,快速按相似度排行输出结果。

例子

pic

pic

pic

pic

pic

pic

pic

pic

pic

视频去重业务

视频去重,可以抽取视频中的关键帧,自关联产生笛卡尔积,计算不同视频的任意两张图片的相似度,相似度达到一定阈值,可以认为是相同视频。

例子

  1. 创建图片表,并将所有视频的关键帧导入表中
  2. create table image (id serial8 primary key, movie_id int, data bytea);
  3.  
  4. 导入图片,假设为jpeg格式
  5. ... 略 ...
  6.  
  7. 生成patten 和 signature
  8. CREATE TABLE pat AS (
  9. SELECT
  10. id, movie_id,
  11. shuffle_pattern(pattern) AS pattern,
  12. pattern2signature(pattern) AS signature
  13. FROM (
  14. SELECT
  15. id, movie_id,
  16. jpeg2pattern( data) AS pattern
  17. FROM
  18. image
  19. ) x
  20. );
  21.  
  22. 计算不同视频的相似度
  23. select t1.movie_id, t1.id, t1.signature<->t2.signature from
  24. pat t1 join pat t2 on (t1.movie_id<>t2.movie_id)
  25. order by t1.signature<->t2.signature desc
  26.  
  27. or
  28.  
  29. select t1.movie_id, t1.id, t1.signature<->t2.signature from
  30. pat t1 join pat t2 on (t1.movie_id<>t2.movie_id)
  31. where t1.signature<->t2.signature > 0.9
  32. order by t1.signature<->t2.signature desc

summary

1. PostgreSQL is a very powerful database features highly customizable. And does not require moving to PostgreSQL kernel. Safe and reliable.

2. Use the image search technology is PostgreSQL extensions of example, the speed of Leverage, remember performance indicators about the location of the nearest neighbor query I've given you.

"PostgreSQL ten billion location data neighbor query milliseconds feedback."

3. If you are interested in developing extensions to PostgreSQL, refer to the article I wrote

"Find the G-spot business experience acid cool - PostgreSQL kernel extension Guide"

Guess you like

Origin www.cnblogs.com/Amos-Turing/p/11419328.html