[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우

GIN vs. GiST 인덱스 이야기
가이아쓰리디㈜
박진우(swat018@gmail.com)
2017. 11. 04

Contents
1.Index
2.Heap
3.Btree and GIN
4.Ttree and GiST
5.summary

Why Index??
Spatial
Index
Visibility
Index
Full Text
Search

Index
인덱스는 지정된 컬럼에 대한 매핑 정보를 가지고 있습니다.
Ex) CREATE INDEX test1_id_index ON test1 (id);

Index
PostgreSQL에서는 다음과 같은 Index type을 지원합니다.
• B-Tree : numbers, text, dates, etc..
• Generalized Inverted Index (GIN)
• Generalized Inverted Search Tree (GiST)
• Space partitioned GiST (SP-GiST)
• Block Range Indexes (BRIN)
• Hash

Heap
Heap(힙) 이란?
: 정렬의 기준이 없이 저장된 테이블의 존재 형태
Block 0
Block 1
Block 2
Block 3
Block 4
Block 0
Block 1
Block 2
Block 3
Block 4
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3

Heap
Block 0
Block 1
Block 2
Block 3
Block 4
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
TID: Physical location of heap tuple
ex) Berlin: 0번째 Block의 2번째 항목이다.
Item Point: Berlin  (0,2)

Heap
• Table file은 n개의 block으로 구성되어 있다.
• 한 block 당 Page의 디폴트 크기는 8192byte(약 8KB)이다.
• 한 페이지(Page)는 Header Info, Record data, free space로 구성되어 있다.

B-tree
Postgres=# CREATE INDEX indexname ON tablename (columnname)
CREATE INDEX test1_id_index ON test1 (id);
• 기본적인 Index type의 방식
• 사용법

GIN
Seoul
(0,12)
Seoul
(4,2)
Seoul
(1,9)
Seoul
(4,1)
Busan
(2,2)
Seoul
(0,12), (4,2),
(1,9), (4,1),
(2,2)
Busan
(2,2)
Posing list
• Generalized Inverted Index (GIN)

GIN
1. Text retrival
postgres=# -- create a table with a text column
postgres=# CREATE TABLE t1 (id serial, t text);
CREATE TABLE
postgres=# CREATE INDEX t1_idx ON t1 USING gin (to_tsvector('english', t));
CREATE INDEX
postgres=# INSERT INTO t1 VALUES (1, 'a fat cat sat on a mat and ate a fat rat');
INSERT 0 1
postgres=# INSERT INTO t1 VALUES (2, 'a fat dog sat on a mat and ate a fat chop');
INSERT 0 1
postgres=# -- is there a row where column t contains the two words? (syntax contains some magic
to hit index)
postgres=# SELECT * FROM t1 WHERE to_tsvector('english', t) @@ to_tsquery('fat & rat');
id | t
----+------------------------------------------
1 | a fat cat sat on a mat and ate a fat rat
(1 row)
postgres=# CREATE INDEX indexname ON tablename USING GIN (columnname);

GIN
2. Array
postgres=# -- create a table where one column exists of an integer array
postgres=# --
postgres=# CREATE TABLE t2 (id serial, temperatures INTEGER[]);
CREATE TABLE
postgres=# CREATE INDEX t2_idx ON t2 USING gin (temperatures);
CREATE INDEX
postgres=# INSERT INTO t2 VALUES (1, '{11, 12, 13, 14}');
INSERT 0 1
postgres=# INSERT INTO t2 VALUES (2, '{21, 22, 23, 24}');
INSERT 0 1
postgres=# -- Is there a row with the two array elements 12 and 11?
postgres=# SELECT * FROM t2 WHERE temperatures @> '{12, 11}';
id | temperatures
----+---------------
1 | {11,12,13,14}
(1 row)

GiST
• “contains”, “left of”, “overlaps”, 등을 지원한다.
• Full Text Search, Geometric operations (PostGIS, etc. ), Handling ranges (tiem, etc.)
• KNN-search, BRTree를 바탕으로 구성되어 있다.

R-tree(Rectangle-tree)
Linear Indexing

R-tree(Rectangle-tree)
Multi-Dimensional

GiST
postgres=# CREATE INDEX indexname ON tablename USING GIST
(columnname);
postgres=# -- create a table with a column of non-trivial type
postgres=# --
postgres=# CREATE TABLE t3 (id serial, c circle);
CREATE TABLE
postgres=# CREATE INDEX t3_idx ON t3 USING gist(c);
CREATE INDEX
postgres=# INSERT INTO t3 VALUES (1, circle '((0, 0), 0.5)');
INSERT 0 1
postgres=# INSERT INTO t3 VALUES (2, circle '((1, 0), 0.5)');
INSERT 0 1
postgres=# INSERT INTO t3 VALUES (3, circle '((0.3, 0.3), 0.3)');
INSERT 0 1
postgres=# -- which circles lie in the bounds of the unit circle?
postgres=# SELECT * FROM t3 WHERE circle '((0, 0), 1)' @> c;
id | c
----+-----------------
1 | <(0,0),0.5>
3 | <(0.3,0.3),0.3>
(2 rows)

summary
• B-tree is ideal for unique values
• GIN is ideal for indexes with many duplicates
• GIST for everything else
Experiments lead to the following observations:
creation time - GIN takes 3x time to build than GiST
size of index - GIN is 2-3 times bigger than GiST
search time - GIN is 3 times faster than GiST
update time - GIN is about 10 times slower than GiST

경청해 주셔서 감사합니다.
swat018@gmail.com

[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우

More Related Content

What's hot (20)

Similar to [Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우 (20)

More from PgDay.Seoul (20)

Recently uploaded (20)

[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우