Pull to refresh
50.79

Партиционирование в postgres 9.x. Использование pg_pathman для оптимизации вставки и отсечения (pruning) партиций

Reading time 59 min
Views 7.8K
Здравствуйте! Хочу рассказать про особенности партиционирования в текущей postgresql 9.х и его улучшении с помощью расширения pg_pathmanвот), созданного парнями из Postgres Professional. Статья предназначена для знакомых с партиционированием разработчиков, которым понадобилось разбить большую БД в postgres, или для тех, кто хочет оценить сложность переноса уже партиционированной не postgres БД на postgres.

Сначала мы создадим схему БД, затем партиционируем её двумя способами(«штатным» и pg_pathman), после чего наполним данными и проверим, как работают запросы по партиционированным таблицам.

Также я расскажу, как это замечательное расширение внедрить в схему данных, уже побитую на партиции «штатным» способом.

1. Создаём схему данных


Наша небольшая схема будет состоять из трёх таблиц:

• партиционированная на три части таблица parent,
• ссылающая на неё партиционированная таким же образом таблица child,
• обычная таблица неких результатов result со ссылками на записи parent с номерами партиций.

Таблицы parent и child партиционированы по ключу partition_id. В эти таблицы добавлены столбцы с некими данными parent_data и child_data соответственно. А также в таблицу child добавлено индексированное поле child_index для использования в подзапросах.


create table public.parent 
(
  parent_id numeric not null,
  parent_data varchar(100),
  partition_id numeric not null,
  primary key (parent_id)
);

create table public.child 
(
  child_id numeric not null,
  parent_id numeric not null,
  child_data varchar(100),
  child_index numeric,
  partition_id numeric not null,
  primary key (child_id)
);
create index ix_child_parent on public.child(parent_id);
create index ix_child_index on public.child(child_index);

create table public.result 
(
  result_id numeric not null,
  parent_id numeric not null,
  partition_id numeric not null
);
create index idx_result on public.result(result_id);

2. Партиционируем схему «штатным» способом


Итак, если ваш глаз зацепился за заголовок и вы всё-таки решили посмотреть эту статью, то наверняка знаете, что встроенного партиционирования в postgres пока нет. А вместо него предлагается использовать имеющийся механизм наследования.

Вот как это выглядит:

create table public.parent_1 ( like public.parent including indexes, check(partition_id = 1) ) inherits (public.parent);
create table public.parent_2 ( like public.parent including indexes, check(partition_id = 2) ) inherits (public.parent);
create table public.parent_3 ( like public.parent including indexes, check(partition_id = 3) ) inherits (public.parent);
create table public.child_1 ( like public.child including indexes, check(partition_id = 1)) inherits (public.child);
create table public.child_2 ( like public.child including indexes, check(partition_id = 2)) inherits (public.child);
create table public.child_3 ( like public.child including indexes, check(partition_id = 3)) inherits (public.child);

create or replace function public.trigger_function() returns trigger as $trg_func$
begin
  execute 'insert into ' || tg_table_name || '_' || new.partition_id ||' values ($1.*)'
  using new;

  return null;
end;$trg_func$ language plpgsql security definer;

create trigger parent_trigger before insert on public.parentfor each row execute procedure public.trigger_function();

create trigger child_trigger before insert on public.childfor each row execute procedure public.trigger_function();

Партиционируем наши две таблицы на три части каждую, большее количество партиций я делать не стал, иначе планы запросов превратят статью в бесконечную простыню. Таблицы наследуем в режиме like including indexes, чтобы primary key и индексы скопировались в наследуемые таблицы. Глобальной уникальности локальные PK не дадут, зато будет локальная уникальность в партициях и локальные же индексы. Также в наследуемых таблицах создаётся ограничение на идентификатор партиции, чтобы работал штатный механизм их отсечения.
Для раскидывания записей по партициям при вставке используется механизм из триггеров и универсальной триггерной функции public.trigger_function. Минусом такого механизма является то, что при вставке в главную таблицу количество вставленных записей возвращается 0.

3. Партиционируем схему используя pg_pathman


Партиционирование таблиц с этим расширением в постгрес делается вызовом специальной функции create_range_partitions:

select public.create_range_partitions(c.oid, 'partition_id', 1, 1, 3)
from pg_class c
inner join pg_namespace n on c.relnamespace = n.oid and n.nspname = 'public'
where c.relname in ('parent', 'child');

При таком партиционировании в таблицы-партиции копируются все индексы из исходных таблиц, а также создаются специальные ограничения для работы этого расширения.

4. Методика тестирования


Перед тем как показывать результаты, опишу стенд и методику тестирования.
Стенд, на котором я тестировал — это виртуалка VMware с 2 гигабайтами оперативной памяти. На ней стоит Red Hat Server 6.7, Postresql 9.6.3 и pg_pathman 1.4.2. В конфиге постгреса shared_buffers = 512M, effective_cache_size = 1GB, constraint_exclusion = partition.
Блок вставок и каждый запрос тестировался следующим образом:

1. При тестировании запроса сначала выполняется сам же запрос для того, чтобы его данные подкачались в дисковый кэш host windows системы, для вставки ничего не делается. Скинуть этот кэш для файлов виртуалки и оставить его для остальных приложений у меня не получилось, полное же скидывание этого кэша вызывало дикие тормоза и не давало корректно что-либо измерить. Поэтому на время выполнения запросов сильно ориентироваться не будем, будем смотреть количество блоков данных, просмотренных запросами. Это количество на обычной большой системе будет пропорционально количеству дисковых чтений, а значит и времени выполнения запроса там же.

2. Скидываются кэши линукса командой echo 3 > /proc/sys/vm/drop_caches.

3. Перезапускается postgres для того, чтобы скинуть уже его кэши.

4. Выполняется запрос или вставка с предшествующим оператором explain (analyse, verbose, buffers) для определения плана, времени и остальных характеристик их выполнения.

5. Вставляем данные


Данные вставляем следующим скриптом:

insert into public.parent (parent_id, parent_data, partition_id)
select a, gen_random_bytes(25), trunc(random() * 3) + 1
from generate_series(1, 10000000) a;

insert into public.child (child_id, parent_id, child_data, child_index, partition_id)
select parent_id * 20 + a, parent_id, gen_random_bytes(25),  trunc(random() * 100000) + 1, partition_id
from public.parent, generate_series(1, 20) a
where random() < 0.1;

insert into public.result (result_id, parent_id, partition_id)
select a, parent_id, partition_id
from generate_series(1,100) a, public.parent
where random() < 0.001;

В таблицу parent вставляем 10 миллионов записей, в child будет примерно в два раза больше.
На одно значение child_index будет приходиться примерно 200 записей в child. На одно значение result_id — по одному миллиону записей parent.

После вставки записей анализируем все таблицы.

analyze public.parent;
analyze public.parent_1;
analyze public.parent_2;
analyze public.parent_3;
analyze public.child;
analyze public.child_1;
analyze public.child_2;
analyze public.child_3;
analyze public.result;

Итак, вставка в наши две таблицы, партиционированные стандартным для postgres способом, длилась 38:39, а с помощью pg_pathman — 15:10. То есть скорость вставки увеличилась в два с половиной раза. В pg_pathman в плане вставки появляется специальный шаг PartitionFilter для перенаправления вставляемой записи и эффект от него видно. Триггер уже не нужен. Также исключение триггера для перенаправления вставляемых записей даёт возможность узнать количество вставленных записей, как и при вставке в обычную таблицу. Для примера приведу план вставки в таблицу parent:

"Insert on parent  (cost=0.00..45.00 rows=1000 width=282)"
"  ->  Custom Scan (PartitionFilter)  (cost=0.00..45.00 rows=1000 width=282)"
"        ->  Subquery Scan on "*SELECT*"  (cost=0.00..45.00 rows=1000 width=282)"
"              ->  Function Scan on generate_series a  (cost=0.00..22.50 rows=1000 width=44)"

6. Внедряем pg_pathman в уже партиционированную схему


Для того чтобы корректно сравнивать производительность партиционирования необходимы одинаковые данные. Для этого я сделаю схему стандартно партиционированную, протестирую запросы по вышеописанной методике. И потом в эту схему внедрю расширение pg_pathman для тестирования с ним.

В расширении есть функции как для партиционирования пустой таблицы, так и для параллельного партиционирования таблицы с данными. Но механизма для регистрации уже партиционированной и заполненной данными таблицы пока нет, и это печально.

Зарегистрировать сразу главную таблицу мы не можем. pg_pathman тоже использует механизм наследования для обозначения связей партиций с главной таблицей.

И при попытке зарегистрировать главную таблицу, pg_pathman не обнаружит своих ограничений на партициях и выдаст ошибку.

Но эту проблему можно решить следующим способом:

1) Первым делом надо все партиции убрать из наследников главной:

alter table parent_1 no inherit parent;
alter table parent_2 no inherit parent;
alter table parent_3 no inherit parent;
alter table child_1 no inherit child;
alter table child_2 no inherit child;
alter table child_3 no inherit child;

2) Затем надо зарегистрировать главные таблицы в pg_pathman:

select public.add_to_pathman_config(c.oid, 'partition_id', '1')
from pg_class c
inner join pg_namespace n on n.oid = c.relnamespace and n.nspname = 'public'
where c.relname in ('parent', 'child');

3) Зарегистрировать партиции в pg_pathman:

select public.attach_range_partition(parent.oid, child.oid,  partition_id.partition_id::numeric, (partition_id.partition_id + 1)::numeric)
from pg_namespace n  
inner join pg_class parent on n.oid = parent.relnamespace and parent.relname in ('parent', 'child')
inner join generate_series(1,3) partition_id on 1=1
inner join pg_class child on n.oid = child.relnamespace and child.relname = parent.relname || '_' || partition_id.partition_id
where n.nspname = 'public';

4) Теперь можно удалить все триггеры и триггерную функцию:

drop trigger parent_trigger on public.parent;
drop trigger child_trigger on public.child;
drop function trigger_function();

5) В конце надо проверить вставку и запросы. Всё должно работать.

7. Тестируем запросы


Стандартный механизм партиционирования работает через ограничения на партициях, которые он сравнивает с константами, указанными в запросе.

В данной статье я использую версию 1.4.2 pg_pathman. В документации к расширению говорится о новом специальном типе шага плана запроса RuntimeAppend, который применяется вместо стандартного Append и отличается от него избирательным сканированием партиций с нужным ключом уже в ходе выполнения запроса. То есть отсечение партиций в этом случае должно работать динамически по ключу партиционирования каждой записи.

Поэтому сперва мы будем смотреть как работают запросы с одним константным значением ключа партиционирования, затем с двумя константными значениями и в конце со значениями ключа партиционирования из обычной таблицы. Планы запросов будут со статистикой выполнения. Сначала будут приводиться результаты запроса для обычного механизма партиционирования (postgres), а потом через pg_pathman.

7.1. Запрос с соединением партиционированных таблиц во from


Одна константа в условии на ключ партиционирования:

select *
from public.parent p, public.child c
where c.partition_id = p.partition_id and c.parent_id = p.parent_id and p.partition_id = 3
limit 100;

postgres:

"Limit  (cost=20.41..31.31 rows=100 width=181) (actual time=2.517..3.093 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"  Buffers: shared hit=4 read=9"
...
"Planning time: 19.035 ms"
"Execution time: 3.607 ms"

Полностью
"Limit  (cost=20.41..31.31 rows=100 width=181) (actual time=2.517..3.093 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"  Buffers: shared hit=4 read=9"
"  ->  Merge Join  (cost=20.41..726234.12 rows=6662376 width=181) (actual time=2.516..3.079 rows=100 loops=1)"
"        Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"        Merge Cond: (p.parent_id = c.parent_id)"
"        Buffers: shared hit=4 read=9"
"        ->  Merge Append  (cost=0.56..184529.87 rows=3332147 width=84) (actual time=1.397..1.414 rows=47 loops=1)"
"              Sort Key: p.parent_id"
"              Buffers: shared hit=1 read=4"
"              ->  Index Scan using parent_pkey on public.parent p  (cost=0.12..8.14 rows=1 width=282) (actual time=0.006..0.006 rows=0 loops=1)"
"                    Output: p.parent_id, p.parent_data, p.partition_id"
"                    Filter: (p.partition_id = '3'::numeric)"
"                    Buffers: shared hit=1"
"              ->  Index Scan using parent_3_pkey on public.parent_3 p_1  (cost=0.43..142869.88 rows=3332146 width=84) (actual time=1.389..1.400 rows=47 loops=1)"
"                    Output: p_1.parent_id, p_1.parent_data, p_1.partition_id"
"                    Filter: (p_1.partition_id = '3'::numeric)"
"                    Buffers: shared read=4"
"        ->  Materialize  (cost=0.57..450113.45 rows=6662376 width=97) (actual time=1.094..1.605 rows=100 loops=1)"
"              Output: c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"              Buffers: shared hit=3 read=5"
"              ->  Merge Append  (cost=0.57..433457.51 rows=6662376 width=97) (actual time=1.089..1.590 rows=100 loops=1)"
"                    Sort Key: c.parent_id"
"                    Buffers: shared hit=3 read=5"
"                    ->  Index Scan using ix_child_parent on public.child c  (cost=0.12..8.14 rows=1 width=346) (actual time=0.004..0.004 rows=0 loops=1)"
"                          Output: c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"                          Filter: (c.partition_id = '3'::numeric)"
"                          Buffers: shared hit=1"
"                    ->  Index Scan using child_3_parent_id_idx on public.child_3 c_1  (cost=0.43..350169.66 rows=6662375 width=97) (actual time=1.083..1.554 rows=100 loops=1)"
"                          Output: c_1.child_id, c_1.parent_id, c_1.child_data, c_1.child_index, c_1.partition_id"
"                          Filter: (c_1.partition_id = '3'::numeric)"
"                          Buffers: shared hit=2 read=5"
"Planning time: 19.035 ms"
"Execution time: 3.607 ms"

pg_pathman:

"Limit  (cost=15.99..25.01 rows=100 width=628) (actual time=0.934..1.111 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"  Buffers: shared hit=2 read=9"
...
"Planning time: 16.164 ms"
"Executiontime: 1.174 ms"

Полностью
"Limit  (cost=15.99..25.01 rows=100 width=628) (actual time=0.934..1.111 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"  Buffers: shared hit=2 read=9"
"  ->  Merge Join  (cost=15.99..601209.25 rows=6662375 width=628) (actual time=0.934..1.101 rows=100 loops=1)"
"        Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"        Merge Cond: (p.parent_id = c.parent_id)"
"        Buffers: shared hit=2 read=9"
"        ->  Merge Append  (cost=0.44..142832.56 rows=3332146 width=84) (actual time=0.464..0.480 rows=47 loops=1)"
"              Sort Key: p.parent_id"
"              Buffers: shared read=4"
"              ->  Index Scan using parent_3_pkey on public.parent_3 p  (cost=0.43..142832.55 rows=3332146 width=84) (actual time=0.464..0.477 rows=47 loops=1)"
"                    Output: p.parent_id, p.parent_data, p.partition_id"
"                    Filter: (p.partition_id = '3'::numeric)"
"                    Buffers: shared read=4"
"        ->  Materialize  (cost=0.44..366781.74 rows=6662375 width=97) (actual time=0.464..0.583 rows=100 loops=1)"
"              Output: c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"              Buffers: shared hit=2 read=5"
"              ->  Merge Append  (cost=0.44..350125.80 rows=6662375 width=97) (actual time=0.459..0.563 rows=100 loops=1)"
"                    Sort Key: c.parent_id"
"                    Buffers: shared hit=2 read=5"
"                    ->  Index Scan using child_3_parent_id_idx on public.child_3 c  (cost=0.43..350125.79 rows=6662375 width=97) (actual time=0.459..0.550 rows=100 loops=1)"
"                          Output: c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"                          Filter: (c.partition_id = '3'::numeric)"
"                          Buffers: shared hit=2 read=5"
"Planning time: 16.164 ms"
"Executiontime: 1.174 ms"

Две константы в условии на ключ партиционирования:

select *
from public.parent p, public.child c
where c.partition_id = p.partition_id and c.parent_id = p.parent_id and p.partition_id in (2, 3)
limit 100;

postgres:

"Limit  (cost=44.98..78.34 rows=100 width=181) (actual time=8.167..8.543 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"  Buffers: shared hit=2 read=21"
...
"Planning time: 17.618 ms"
"Execution time: 8.870 ms"

Полностью
"Limit  (cost=44.98..78.34 rows=100 width=181) (actual time=8.167..8.543 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"  Buffers: shared hit=2 read=21"
"  ->  Merge Join  (cost=44.98..2223585.66 rows=6665904 width=181) (actual time=8.165..8.534 rows=100 loops=1)"
"        Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"        Merge Cond: (p.parent_id = c.parent_id)"
"        Join Filter: (p.partition_id = c.partition_id)"
"        Buffers: shared hit=2 read=21"
"        ->  Merge Append  (cost=1.01..408142.24 rows=6665413 width=84) (actual time=5.390..5.416 rows=53 loops=1)"
"              Sort Key: p.parent_id"
"              Buffers: shared hit=1 read=8"
"              ->  Index Scan using parent_pkey on public.parent p  (cost=0.12..8.14 rows=1 width=282) (actual time=0.004..0.004 rows=0 loops=1)"
"                    Output: p.parent_id, p.parent_data, p.partition_id"
"                    Filter: (p.partition_id = ANY ('{2,3}'::numeric[]))"
"                    Buffers: shared hit=1"
"              ->  Index Scan using parent_2_pkey on public.parent_2 p_1  (cost=0.43..142956.36 rows=3333266 width=84) (actual time=3.594..3.598 rows=28 loops=1)"
"                    Output: p_1.parent_id, p_1.parent_data, p_1.partition_id"
"                    Filter: (p_1.partition_id = ANY ('{2,3}'::numeric[]))"
"                    Buffers: shared read=4"
"              ->  Index Scan using parent_3_pkey on public.parent_3 p_2  (cost=0.43..142869.88 rows=3332146 width=84) (actual time=1.786..1.798 rows=26 loops=1)"
"                    Output: p_2.parent_id, p_2.parent_data, p_2.partition_id"
"                    Filter: (p_2.partition_id = ANY ('{2,3}'::numeric[]))"
"                    Buffers: shared read=4"
"        ->  Materialize  (cost=1.46..1498856.74 rows=19997711 width=97) (actual time=2.746..3.039 rows=173 loops=1)"
"              Output: c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"              Buffers: shared hit=1 read=13"
"              ->  Merge Append  (cost=1.46..1448862.46 rows=19997711 width=97) (actual time=2.736..2.987 rows=173 loops=1)"
"                    Sort Key: c.parent_id"
"                    Buffers: shared hit=1 read=13"
"                    ->  Index Scan using ix_child_parent on public.child c  (cost=0.12..8.14 rows=1 width=346) (actual time=0.004..0.004 rows=0 loops=1)"
"                          Output: c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"                          Buffers: shared hit=1"
"                    ->  Index Scan using child_1_parent_id_idx on public.child_1 c_1  (cost=0.43..331779.20 rows=6666670 width=97) (actual time=0.559..0.744 rows=74 loops=1)"
"                          Output: c_1.child_id, c_1.parent_id, c_1.child_data, c_1.child_index, c_1.partition_id"
"                          Buffers: shared read=5"
"                    ->  Index Scan using child_2_parent_id_idx on public.child_2 c_2  (cost=0.43..333612.86 rows=6668665 width=97) (actual time=1.022..1.029 rows=51 loops=1)"
"                          Output: c_2.child_id, c_2.parent_id, c_2.child_data, c_2.child_index, c_2.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_3_parent_id_idx on public.child_3 c_3  (cost=0.43..333513.72 rows=6662375 width=97) (actual time=1.146..1.152 rows=50 loops=1)"
"                          Output: c_3.child_id, c_3.parent_id, c_3.child_data, c_3.child_index, c_3.partition_id"
"                          Buffers: shared read=4"
"Planning time: 17.618 ms"
"Execution time: 8.870 ms"

pg_pathman:

"Limit  (cost=32.45..57.22 rows=100 width=628) (actual time=2.457..2.714 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"  Buffers: shared read=21"
...
"Planning time: 18.944 ms"
"Executiontime: 2.798 ms"

Полностью
"Limit  (cost=32.45..57.22 rows=100 width=628) (actual time=2.457..2.714 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"  Buffers: shared read=21"
"  ->  Merge Join  (cost=32.45..1651206.45 rows=6665904 width=628) (actual time=2.455..2.705 rows=100 loops=1)"
"        Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"        Merge Cond: (p.parent_id = c.parent_id)"
"        Join Filter: (p.partition_id = c.partition_id)"
"        Buffers: shared read=21"
"        ->  Merge Append  (cost=0.87..285722.16 rows=6665412 width=84) (actual time=1.011..1.037 rows=53 loops=1)"
"              Sort Key: p.parent_id"
"              Buffers: shared read=8"
"              ->  Index Scan using parent_2_pkey on public.parent_2 p  (cost=0.43..142889.60 rows=3333266 width=84) (actual time=0.522..0.528 rows=28 loops=1)"
"                    Output: p.parent_id, p.parent_data, p.partition_id"
"                    Filter: (p.partition_id = ANY ('{2,3}'::numeric[]))"
"                    Buffers: shared read=4"
"              ->  Index Scan using parent_3_pkey on public.parent_3 p_1  (cost=0.43..142832.55 rows=3332146 width=84) (actual time=0.486..0.493 rows=26 loops=1)"
"                    Output: p_1.parent_id, p_1.parent_data, p_1.partition_id"
"                    Filter: (p_1.partition_id = ANY ('{2,3}'::numeric[]))"
"                    Buffers: shared read=4"
"        ->  Materialize  (cost=1.32..1048885.37 rows=19997710 width=97) (actual time=1.437..1.595 rows=173 loops=1)"
"              Output: c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"              Buffers: shared read=13"
"              ->  Merge Append  (cost=1.32..998891.10 rows=19997710 width=97) (actual time=1.432..1.561 rows=173 loops=1)"
"                    Sort Key: c.parent_id"
"                    Buffers: shared read=13"
"                    ->  Index Scan using child_1_parent_id_idx on public.child_1 c  (cost=0.43..331772.15 rows=6666670 width=97) (actual time=0.433..0.510 rows=74 loops=1)"
"                          Output: c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"                          Buffers: shared read=5"
"                    ->  Index Scan using child_2_parent_id_idx on public.child_2 c_1  (cost=0.43..333608.02 rows=6668665 width=97) (actual time=0.561..0.563 rows=51 loops=1)"
"                          Output: c_1.child_id, c_1.parent_id, c_1.child_data, c_1.child_index, c_1.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_3_parent_id_idx on public.child_3 c_2  (cost=0.43..333510.91 rows=6662375 width=97) (actual time=0.437..0.442 rows=50 loops=1)"
"                          Output: c_2.child_id, c_2.parent_id, c_2.child_data, c_2.child_index, c_2.partition_id"
"                          Buffers: shared read=4"
"Planning time: 18.944 ms"
"Executiontime: 2.798 ms"

Ключ партиционирования из таблицы:

select *
from public.result r, public.parent p
where r.partition_id = p.partition_id and r.parent_id = p.parent_id and r.result_id = 50
limit 100;

postgres:

"Limit  (cost=199.17..6528.51 rows=100 width=100) (actual time=14.815..65.450 rows=100 loops=1)"
"  Output: r.result_id, r.parent_id, r.partition_id, p.parent_id, p.parent_data, p.partition_id"
"  Buffers: shared hit=651 read=431"
...
"Planning time: 13.773 ms"
"Execution time: 65.750 ms"

Полностью
"Limit  (cost=199.17..6528.51 rows=100 width=100) (actual time=14.815..65.450 rows=100 loops=1)"
"  Output: r.result_id, r.parent_id, r.partition_id, p.parent_id, p.parent_data, p.partition_id"
"  Buffers: shared hit=651 read=431"
"  ->  Nested Loop  (cost=199.17..192800.93 rows=3043 width=100) (actual time=14.813..65.407 rows=100 loops=1)"
"        Output: r.result_id, r.parent_id, r.partition_id, p.parent_id, p.parent_data, p.partition_id"
"        Buffers: shared hit=651 read=431"
"        ->  Bitmap Heap Scan on public.result r  (cost=199.17..7003.62 rows=9129 width=16) (actual time=11.301..17.396 rows=100 loops=1)"
"              Output: r.result_id, r.parent_id, r.partition_id"
"              Recheck Cond: (r.result_id = '50'::numeric)"
"              Heap Blocks: exact=49"
"              Buffers: shared read=87"
"              ->  Bitmap Index Scan on idx_result  (cost=0.00..196.89 rows=9129 width=0) (actual time=10.438..10.438 rows=10051 loops=1)"
"                    Index Cond: (r.result_id = '50'::numeric)"
"                    Buffers: shared read=38"
"        ->  Append  (cost=0.00..20.31 rows=4 width=84) (actual time=0.255..0.477 rows=1 loops=100)"
"              Buffers: shared hit=651 read=344"
"              ->  Seq Scan on public.parent p  (cost=0.00..0.00 rows=1 width=282) (actual time=0.000..0.000 rows=0 loops=100)"
"                    Output: p.parent_id, p.parent_data, p.partition_id"
"                    Filter: ((r.partition_id = p.partition_id) AND (r.parent_id = p.parent_id))"
"              ->  Index Scan using parent_1_pkey on public.parent_1 p_1  (cost=0.43..6.77 rows=1 width=84) (actual time=0.252..0.253 rows=1 loops=100)"
"                    Output: p_1.parent_id, p_1.parent_data, p_1.partition_id"
"                    Index Cond: (p_1.parent_id = r.parent_id)"
"                    Filter: (r.partition_id = p_1.partition_id)"
"                    Buffers: shared hit=222 read=178"
"              ->  Index Scan using parent_2_pkey on public.parent_2 p_2  (cost=0.43..6.77 rows=1 width=84) (actual time=0.112..0.112 rows=0 loops=99)"
"                    Output: p_2.parent_id, p_2.parent_data, p_2.partition_id"
"                    Index Cond: (p_2.parent_id = r.parent_id)"
"                    Filter: (r.partition_id = p_2.partition_id)"
"                    Buffers: shared hit=214 read=84"
"              ->  Index Scan using parent_3_pkey on public.parent_3 p_3  (cost=0.43..6.77 rows=1 width=84) (actual time=0.108..0.108 rows=0 loops=99)"
"                    Output: p_3.parent_id, p_3.parent_data, p_3.partition_id"
"                    Index Cond: (p_3.parent_id = r.parent_id)"
"                    Filter: (r.partition_id = p_3.partition_id)"
"                    Buffers: shared hit=215 read=82"
"Planning time: 13.773 ms"
"Execution time: 65.750 ms"

pg_pathman:

"Limit  (cost=199.60..2457.44 rows=100 width=298) (actual time=7.354..30.780 rows=100 loops=1)"
"  Output: r.result_id, r.parent_id, r.partition_id, p.parent_id, p.parent_data, p.partition_id"
"  Buffers: shared hit=222 read=265"
...
"Planning time: 11.799 ms"
"Executiontime: 30.920 ms"

Полностью
"Limit  (cost=199.60..2457.44 rows=100 width=298) (actual time=7.354..30.780 rows=100 loops=1)"
"  Output: r.result_id, r.parent_id, r.partition_id, p.parent_id, p.parent_data, p.partition_id"
"  Buffers: shared hit=222 read=265"
"  ->  Nested Loop  (cost=199.60..68905.63 rows=3043 width=298) (actual time=7.352..30.742 rows=100 loops=1)"
"        Output: r.result_id, r.parent_id, r.partition_id, p.parent_id, p.parent_data, p.partition_id"
"        Buffers: shared hit=222 read=265"
"        ->  Bitmap Heap Scan on public.result r  (cost=199.17..7003.62 rows=9129 width=16) (actual time=6.838..10.367 rows=100 loops=1)"
"              Output: r.result_id, r.parent_id, r.partition_id"
"              Recheck Cond: (r.result_id = '50'::numeric)"
"              Heap Blocks: exact=49"
"              Buffers: shared read=87"
"              ->  Bitmap Index Scan on idx_result  (cost=0.00..196.89 rows=9129 width=0) (actual time=5.997..5.997 rows=10051 loops=1)"
"                    Index Cond: (r.result_id = '50'::numeric)"
"                    Buffers: shared read=38"
"        ->  Custom Scan (RuntimeAppend)  (cost=0.43..6.77 rows=1 width=84) (actual time=0.194..0.196 rows=1 loops=100)"
"              Output: p.parent_id, p.parent_data, p.partition_id"
"              Prune by: (r.partition_id = p.partition_id)"
"              Buffers: shared hit=222 read=178"
"              ->  Index Scan using parent_1_pkey on public.parent_1 p_1  (cost=0.43..6.77 rows=1 width=84) (actual time=0.193..0.194 rows=1 loops=100)"
"                    Output: p_1.parent_id, p_1.parent_data, p_1.partition_id"
"                    Index Cond: (p_1.parent_id = r.parent_id)"
"                    Filter: (r.partition_id = p_1.partition_id)"
"                    Buffers: shared hit=222 read=178"
"Planning time: 11.799 ms"
"Executiontime: 30.920 ms"

При стандартном способе партиционирования отсечение партиций с одной константой распространилось на связанную партиционированную таблицу и отрезало неподходящие части. Но пустая главная таблица всё равно просматривалась, так как на ней нет партиционной константы. В запросе с двумя константами pruning сработал только на таблице parent, на ключ партиционирования которой ставится условие. Соединённая с ней же таблица child просматривалась полностью. То есть отсечение партиций соединённых таблиц работает лишь при указании одного константного значения ключа партиционирования. В запросе же с ключами партиционирования из связанной таблицы отсечение не сработало совсем, что очень плохо, хотя и соответствует документации.

Расширение pg_pathman убрало из рассмотрения пустые главные таблицы, но для первого и второго запросов не поменяло структуру планов, количество чтений блоков почти одинаковое, что неудивительно.

Но с размером записей похоже в pg_pathman вылез баг: в Merge Join в обычном варианте размер записи считается сложением размеров записей объединяемых таблиц (width=182 = 84 + 98), в расширении же возможно берутся максимальные размеры записей соединяемых таблиц. А максимальные размеры – это теоретические максимальные размеры из главных таблиц parent и child (width=628 = 282 + 346). И это при том, что эти таблицы pg_pathman исключает из рассмотрения.

Также в двухконстантном запросе таблица child просматривается полностью без главной, pruning и их волшебный шаг RuntimeAppend почему-то не срабатывает.

В запросе же с ключами партиционирования из обычной таблицы срабатывает шаг RuntimeAppend, проход идёт только по нужной партиции, и это ускоряет выполнение запроса. Что неудивительно, достаточно сравнить количество чтений в запроса Buffers: shared hit=651 read=431 и Buffers: shared hit=222 read=265.

Запросы с соединениями через inner и left jon ведут себя аналогично, поэтому анализ их планов пропустим.

7.2. Запрос с подзапросом в полях.


Одна константа в условии на ключ партиционирования:

select *, (select min(c.child_data) from public.child c where c.partition_id = p.partition_id and c.parent_id = p.parent_id)
from public.parent p
where p.partition_id = 3
limit 100;

postgres:

"Limit  (cost=0.00..2588.75 rows=100 width=116) (actual time=1.939..4.835 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, ((SubPlan 1))"
"  Buffers: shared hit=977 read=18"
...
"Planning time: 14.819 ms"
"Execution time: 4.935 ms"

Полностью
"Limit  (cost=0.00..2588.75 rows=100 width=116) (actual time=1.939..4.835 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, ((SubPlan 1))"
"  Buffers: shared hit=977 read=18"
"  ->  Result  (cost=0.00..86260794.66 rows=3332147 width=116) (actual time=1.936..4.817 rows=100 loops=1)"
"        Output: p.parent_id, p.parent_data, p.partition_id, (SubPlan 1)"
"        Buffers: shared hit=977 read=18"
"        ->  Append  (cost=0.00..89615.83 rows=3332147 width=84) (actual time=0.026..0.130 rows=100 loops=1)"
"              Buffers: shared read=2"
"              ->  Seq Scan on public.parent p  (cost=0.00..0.00 rows=1 width=282) (actual time=0.002..0.002 rows=0 loops=1)"
"                    Output: p.parent_id, p.parent_data, p.partition_id"
"                    Filter: (p.partition_id = '3'::numeric)"
"              ->  Seq Scan on public.parent_3 p_1  (cost=0.00..89615.83 rows=3332146 width=84) (actual time=0.024..0.117 rows=100 loops=1)"
"                    Output: p_1.parent_id, p_1.parent_data, p_1.partition_id"
"                    Filter: (p_1.partition_id = '3'::numeric)"
"                    Buffers: shared read=2"
"        SubPlan 1"
"          ->  Aggregate  (cost=25.84..25.85 rows=1 width=32) (actual time=0.046..0.046 rows=1 loops=100)"
"                Output: min((c.child_data)::text)"
"                Buffers: shared hit=977 read=16"
"                ->  Append  (cost=0.00..25.78 rows=25 width=79) (actual time=0.042..0.044 rows=2 loops=100)"
"                      Buffers: shared hit=977 read=16"
"                      ->  Seq Scan on public.child c  (cost=0.00..0.00 rows=1 width=218) (actual time=0.000..0.000 rows=0 loops=100)"
"                            Output: c.child_data"
"                            Filter: ((c.partition_id = p.partition_id) AND (c.parent_id = p.parent_id))"
"                      ->  Index Scan using child_1_parent_id_idx on public.child_1 c_1  (cost=0.43..8.59 rows=8 width=73) (actual time=0.013..0.013 rows=0 loops=100)"
"                            Output: c_1.child_data"
"                            Index Cond: (c_1.parent_id = p.parent_id)"
"                            Filter: (c_1.partition_id = p.partition_id)"
"                            Buffers: shared hit=296 read=4"
"                      ->  Index Scan using child_2_parent_id_idx on public.child_2 c_2  (cost=0.43..8.59 rows=8 width=73) (actual time=0.012..0.012 rows=0 loops=100)"
"                            Output: c_2.child_data"
"                            Index Cond: (c_2.parent_id = p.parent_id)"
"                            Filter: (c_2.partition_id = p.partition_id)"
"                            Buffers: shared hit=296 read=4"
"                      ->  Index Scan using child_3_parent_id_idx on public.child_3 c_3  (cost=0.43..8.59 rows=8 width=73) (actual time=0.015..0.016 rows=2 loops=100)"
"                            Output: c_3.child_data"
"                            Index Cond: (c_3.parent_id = p.parent_id)"
"                            Filter: (c_3.partition_id = p.partition_id)"
"                            Buffers: shared hit=385 read=8"
"Planning time: 14.819 ms"
"Execution time: 4.935 ms"

pg_pathman:

"Limit  (cost=0.00..865.96 rows=100 width=314) (actual time=1.569..3.879 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, ((SubPlan 1))"
"  Buffers: shared hit=385 read=10"
...
"Planning time: 22.729 ms"
"Executiontime: 4.006 ms"

Полностью
"Limit  (cost=0.00..865.96 rows=100 width=314) (actual time=1.569..3.879 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, ((SubPlan 1))"
"  Buffers: shared hit=385 read=10"
"  ->  Result  (cost=0.00..28854985.31 rows=3332146 width=314) (actual time=1.564..3.858 rows=100 loops=1)"
"        Output: p.parent_id, p.parent_data, p.partition_id, (SubPlan 1)"
"        Buffers: shared hit=385 read=10"
"        ->  Append  (cost=0.00..89615.83 rows=3332146 width=84) (actual time=0.403..0.531 rows=100 loops=1)"
"              Buffers: shared read=2"
"              ->  Seq Scan on public.parent_3 p  (cost=0.00..89615.83 rows=3332146 width=84) (actual time=0.401..0.518 rows=100 loops=1)"
"                    Output: p.parent_id, p.parent_data, p.partition_id"
"                    Filter: (p.partition_id = '3'::numeric)"
"                    Buffers: shared read=2"
"        SubPlan 1"
"          ->  Aggregate  (cost=8.61..8.62 rows=1 width=32) (actual time=0.031..0.032 rows=1 loops=100)"
"                Output: min((c.child_data)::text)"
"                Buffers: shared hit=385 read=8"
"                ->  Custom Scan (RuntimeAppend)  (cost=0.43..8.59 rows=8 width=218) (actual time=0.024..0.025 rows=2 loops=100)"
"                      Output: c.child_data"
"                      Prune by: (c.partition_id = p.partition_id)"
"                      Buffers: shared hit=385 read=8"
"                      ->  Index Scan using child_3_parent_id_idx on public.child_3 c_1  (cost=0.43..8.59 rows=8 width=73) (actual time=0.023..0.024 rows=2 loops=100)"
"                            Output: c_1.child_data, c_1.partition_id"
"                            Index Cond: (c_1.parent_id = p.parent_id)"
"                            Filter: (c_1.partition_id = p.partition_id)"
"                            Buffers: shared hit=385 read=8"
"Planning time: 22.729 ms"
"Executiontime: 4.006 ms"

Две константы в условии на ключ партиционирования:

select *, (select min(c.child_data) from public.child c where c.partition_id = p.partition_id and c.parent_id = p.parent_id)
from public.parent p
where p.partition_id in (2, 3)
limit 100;

postgres:

"Limit  (cost=0.00..2588.75 rows=100 width=116) (actual time=1.420..2.874 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, ((SubPlan 1))"
"  Buffers: shared hit=978 read=15"
...
"Planning time: 15.798 ms"
"Execution time: 2.988 ms"

Полностью
"Limit  (cost=0.00..2588.75 rows=100 width=116) (actual time=1.420..2.874 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, ((SubPlan 1))"
"  Buffers: shared hit=978 read=15"
"  ->  Result  (cost=0.00..172550562.28 rows=6665413 width=116) (actual time=1.418..2.868 rows=100 loops=1)"
"        Output: p.parent_id, p.parent_data, p.partition_id, (SubPlan 1)"
"        Buffers: shared hit=978 read=15"
"        ->  Append  (cost=0.00..179266.65 rows=6665413 width=84) (actual time=0.122..0.170 rows=100 loops=1)"
"              Buffers: shared read=2"
"              ->  Seq Scan on public.parent p  (cost=0.00..0.00 rows=1 width=282) (actual time=0.002..0.002 rows=0 loops=1)"
"                    Output: p.parent_id, p.parent_data, p.partition_id"
"                    Filter: (p.partition_id = ANY ('{2,3}'::numeric[]))"
"              ->  Seq Scan on public.parent_2 p_1  (cost=0.00..89650.83 rows=3333266 width=84) (actual time=0.119..0.153 rows=100 loops=1)"
"                    Output: p_1.parent_id, p_1.parent_data, p_1.partition_id"
"                    Filter: (p_1.partition_id = ANY ('{2,3}'::numeric[]))"
"                    Buffers: shared read=2"
"              ->  Seq Scan on public.parent_3 p_2  (cost=0.00..89615.83 rows=3332146 width=84) (never executed)"
"                    Output: p_2.parent_id, p_2.parent_data, p_2.partition_id"
"                    Filter: (p_2.partition_id = ANY ('{2,3}'::numeric[]))"
"        SubPlan 1"
"          ->  Aggregate  (cost=25.84..25.85 rows=1 width=32) (actual time=0.026..0.026 rows=1 loops=100)"
"                Output: min((c.child_data)::text)"
"                Buffers: shared hit=978 read=13"
"                ->  Append  (cost=0.00..25.78 rows=25 width=79) (actual time=0.018..0.025 rows=2 loops=100)"
"                      Buffers: shared hit=978 read=13"
"                      ->  Seq Scan on public.child c  (cost=0.00..0.00 rows=1 width=218) (actual time=0.000..0.000 rows=0 loops=100)"
"                            Output: c.child_data"
"                            Filter: ((c.partition_id = p.partition_id) AND (c.parent_id = p.parent_id))"
"                      ->  Index Scan using child_1_parent_id_idx on public.child_1 c_1  (cost=0.43..8.59 rows=8 width=73) (actual time=0.006..0.006 rows=0 loops=100)"
"                            Output: c_1.child_data"
"                            Index Cond: (c_1.parent_id = p.parent_id)"
"                            Filter: (c_1.partition_id = p.partition_id)"
"                            Buffers: shared hit=297 read=3"
"                      ->  Index Scan using child_2_parent_id_idx on public.child_2 c_2  (cost=0.43..8.59 rows=8 width=73) (actual time=0.011..0.011 rows=2 loops=100)"
"                            Output: c_2.child_data"
"                            Index Cond: (c_2.parent_id = p.parent_id)"
"                            Filter: (c_2.partition_id = p.partition_id)"
"                            Buffers: shared hit=384 read=7"
"                      ->  Index Scan using child_3_parent_id_idx on public.child_3 c_3  (cost=0.43..8.59 rows=8 width=73) (actual time=0.007..0.007 rows=0 loops=100)"
"                            Output: c_3.child_data"
"                            Index Cond: (c_3.parent_id = p.parent_id)"
"                            Filter: (c_3.partition_id = p.partition_id)"
"                            Buffers: shared hit=297 read=3"
"Planning time: 15.798 ms"
"Execution time: 2.988 ms"

pg_pathman:

"Limit  (cost=0.00..865.96 rows=100 width=314) (actual time=0.672..1.692 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, ((SubPlan 1))"
"  Buffers: shared hit=384 read=9"
...
"Planning time: 17.897 ms"
"Execution time: 1.774 ms"

Полностью
"Limit  (cost=0.00..865.96 rows=100 width=314) (actual time=0.672..1.692 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, ((SubPlan 1))"
"  Buffers: shared hit=384 read=9"
"  ->  Result  (cost=0.00..57719674.22 rows=6665412 width=314) (actual time=0.670..1.683 rows=100 loops=1)"
"        Output: p.parent_id, p.parent_data, p.partition_id, (SubPlan 1)"
"        Buffers: shared hit=384 read=9"
"        ->  Append  (cost=0.00..179266.65 rows=6665412 width=84) (actual time=0.126..0.270 rows=100 loops=1)"
"              Buffers: shared read=2"
"              ->  Seq Scan on public.parent_2 p  (cost=0.00..89650.83 rows=3333266 width=84) (actual time=0.124..0.257 rows=100 loops=1)"
"                    Output: p.parent_id, p.parent_data, p.partition_id"
"                    Filter: (p.partition_id = ANY ('{2,3}'::numeric[]))"
"                    Buffers: shared read=2"
"              ->  Seq Scan on public.parent_3 p_1  (cost=0.00..89615.83 rows=3332146 width=84) (never executed)"
"                    Output: p_1.parent_id, p_1.parent_data, p_1.partition_id"
"                    Filter: (p_1.partition_id = ANY ('{2,3}'::numeric[]))"
"        SubPlan 1"
"          ->  Aggregate  (cost=8.61..8.62 rows=1 width=32) (actual time=0.013..0.014 rows=1 loops=100)"
"                Output: min((c.child_data)::text)"
"                Buffers: shared hit=384 read=7"
"                ->  Custom Scan (RuntimeAppend)  (cost=0.43..8.59 rows=8 width=218) (actual time=0.010..0.010 rows=2 loops=100)"
"                      Output: c.child_data"
"                      Prune by: (c.partition_id = p.partition_id)"
"                      Buffers: shared hit=384 read=7"
"                      ->  Index Scan using child_2_parent_id_idx on public.child_2 c_1  (cost=0.43..8.59 rows=8 width=73) (actual time=0.009..0.010 rows=2 loops=100)"
"                            Output: c_1.child_data, c_1.partition_id"
"                            Index Cond: (c_1.parent_id = p.parent_id)"
"                            Filter: (c_1.partition_id = p.partition_id)"
"                            Buffers: shared hit=384 read=7"
"Planning time: 17.897 ms"
"Execution time: 1.774 ms"

Ключ партиционирования из таблицы:

select *, (select min(p.parent_data) from public.parent p where r.partition_id = p.partition_id and r.parent_id = p.parent_id)
from public.result r
where r.result_id = 50
limit 100;

postgres:

"Limit  (cost=0.00..2743.60 rows=100 width=48) (actual time=1.751..41.373 rows=100 loops=1)"
"  Output: r.result_id, r.parent_id, r.partition_id, ((SubPlan 1))"
"  Buffers: shared hit=655 read=409"
...
"Planning time: 10.229 ms"
"Execution time: 41.462 ms"

Полностью
"Limit  (cost=0.00..2743.60 rows=100 width=48) (actual time=1.751..41.373 rows=100 loops=1)"
"  Output: r.result_id, r.parent_id, r.partition_id, ((SubPlan 1))"
"  Buffers: shared hit=655 read=409"
"  ->  Seq Scan on public.result r  (cost=0.00..250463.09 rows=9129 width=48) (actual time=1.748..41.339 rows=100 loops=1)"
"        Output: r.result_id, r.parent_id, r.partition_id, (SubPlan 1)"
"        Filter: (r.result_id = '50'::numeric)"
"        Rows Removed by Filter: 9709"
"        Buffers: shared hit=655 read=409"
"        SubPlan 1"
"          ->  Aggregate  (cost=25.36..25.37 rows=1 width=32) (actual time=0.382..0.382 rows=1 loops=100)"
"                Output: min((p.parent_data)::text)"
"                Buffers: shared hit=655 read=346"
"                ->  Append  (cost=0.00..25.35 rows=4 width=109) (actual time=0.221..0.379 rows=1 loops=100)"
"                      Buffers: shared hit=655 read=346"
"                      ->  Seq Scan on public.parent p  (cost=0.00..0.00 rows=1 width=218) (actual time=0.000..0.000 rows=0 loops=100)"
"                            Output: p.parent_data"
"                            Filter: ((r.partition_id = p.partition_id) AND (r.parent_id = p.parent_id))"
"                      ->  Index Scan using parent_1_pkey on public.parent_1 p_1  (cost=0.43..8.45 rows=1 width=73) (actual time=0.218..0.219 rows=1 loops=100)"
"                            Output: p_1.parent_data"
"                            Index Cond: (r.parent_id = p_1.parent_id)"
"                            Filter: (r.partition_id = p_1.partition_id)"
"                            Buffers: shared hit=222 read=178"
"                      ->  Index Scan using parent_2_pkey on public.parent_2 p_2  (cost=0.43..8.45 rows=1 width=73) (actual time=0.078..0.078 rows=0 loops=100)"
"                            Output: p_2.parent_data"
"                            Index Cond: (r.parent_id = p_2.parent_id)"
"                            Filter: (r.partition_id = p_2.partition_id)"
"                            Buffers: shared hit=216 read=85"
"                      ->  Index Scan using parent_3_pkey on public.parent_3 p_3  (cost=0.43..8.45 rows=1 width=73) (actual time=0.077..0.077 rows=0 loops=100)"
"                            Output: p_3.parent_data"
"                            Index Cond: (r.parent_id = p_3.parent_id)"
"                            Filter: (r.partition_id = p_3.partition_id)"
"                            Buffers: shared hit=217 read=83"
"Planning time: 10.229 ms"
"Execution time: 41.462 ms"

pg_pathman:

"Limit  (cost=0.00..1052.85 rows=100 width=48) (actual time=0.946..27.732 rows=100 loops=1)"
"  Output: r.result_id, r.parent_id, r.partition_id, ((SubPlan 1))"
"  Buffers: shared hit=222 read=241"
...
"Planning time: 13.332 ms"
"Executiontime: 27.792 ms"

Полностью
"Limit  (cost=0.00..1052.85 rows=100 width=48) (actual time=0.946..27.732 rows=100 loops=1)"
"  Output: r.result_id, r.parent_id, r.partition_id, ((SubPlan 1))"
"  Buffers: shared hit=222 read=241"
"  ->  Seq Scan on public.result r  (cost=0.00..96114.52 rows=9129 width=48) (actual time=0.944..27.685 rows=100 loops=1)"
"        Output: r.result_id, r.parent_id, r.partition_id, (SubPlan 1)"
"        Filter: (r.result_id = '50'::numeric)"
"        Rows Removed by Filter: 9709"
"        Buffers: shared hit=222 read=241"
"        SubPlan 1"
"          ->  Aggregate  (cost=8.45..8.46 rows=1 width=32) (actual time=0.245..0.245 rows=1 loops=100)"
"                Output: min((p.parent_data)::text)"
"                Buffers: shared hit=222 read=178"
"                ->  Custom Scan (RuntimeAppend)  (cost=0.43..8.45 rows=1 width=218) (actual time=0.232..0.233 rows=1 loops=100)"
"                      Output: p.parent_data"
"                      Prune by: (r.partition_id = p.partition_id)"
"                      Buffers: shared hit=222 read=178"
"                      ->  Index Scan using parent_1_pkey on public.parent_1 p_1  (cost=0.43..8.45 rows=1 width=73) (actual time=0.230..0.231 rows=1 loops=100)"
"                            Output: p_1.parent_data, p_1.partition_id"
"                            Index Cond: (r.parent_id = p_1.parent_id)"
"                            Filter: (r.partition_id = p_1.partition_id)"
"                            Buffers: shared hit=222 read=178"
"Planning time: 13.332 ms"
"Executiontime: 27.792 ms"

А вот с подзапросом всё интересней: в стандартном режиме проброс константы в подзапрос не сработал, и там перебираются все партиции. pg_pathman же включает свой шаг RuntimeAppend и в результате запросы перебирают 400-500 блоков вместо 1000. Для большего количества партиций эффективность ещё увеличится.

Во втором запросе проявилась одна интересная особенность шага Append postgres: видно, что партиция 3 не просматривалась (never executed), то есть шаг нашёл требуемые сто записей во второй партиции, и не стал лезть в третью.

7.3. Запрос с соединением с партиционированными таблицами, соединёнными через union


Одна константа в условии на ключ партиционирования:

select *
from public.parent p,
(
  select partition_id, parent_id, parent_data from public.parent
  union all
  select partition_id, parent_id, child_data from public.child
) u
where u.partition_id = p.partition_id and u.parent_id = p.parent_id and p.partition_id = 3
limit 100;

postgres:

"Limit  (cost=1.72..12.33 rows=100 width=168) (actual time=1.448..1.795 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, parent.partition_id, parent.parent_id, parent.parent_data"
"  Buffers: shared hit=9 read=9"
...
"Planning time: 10.951 ms"
"Execution time: 1.945 ms"

Полностью
"Limit  (cost=1.72..12.33 rows=100 width=168) (actual time=1.448..1.795 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, parent.partition_id, parent.parent_id, parent.parent_data"
"  Buffers: shared hit=9 read=9"
"  ->  Merge Join  (cost=1.72..1060723.28 rows=9994523 width=168) (actual time=1.447..1.785 rows=100 loops=1)"
"        Output: p.parent_id, p.parent_data, p.partition_id, parent.partition_id, parent.parent_id, parent.parent_data"
"        Merge Cond: (p.parent_id = parent.parent_id)"
"        Buffers: shared hit=9 read=9"
"        ->  Merge Append  (cost=0.56..184533.19 rows=3332147 width=84) (actual time=0.660..0.677 rows=32 loops=1)"
"              Sort Key: p.parent_id"
"              Buffers: shared hit=1 read=4"
"              ->  Index Scan using parent_pkey on public.parent p  (cost=0.12..8.14 rows=1 width=282) (actual time=0.003..0.003 rows=0 loops=1)"
"                    Output: p.parent_id, p.parent_data, p.partition_id"
"                    Filter: (p.partition_id = '3'::numeric)"
"                    Buffers: shared hit=1"
"              ->  Index Scan using parent_3_pkey on public.parent_3 p_1  (cost=0.43..142873.20 rows=3332146 width=84) (actual time=0.656..0.667 rows=32 loops=1)"
"                    Output: p_1.parent_id, p_1.parent_data, p_1.partition_id"
"                    Filter: (p_1.partition_id = '3'::numeric)"
"                    Buffers: shared read=4"
"        ->  Materialize  (cost=1.15..742928.19 rows=9994523 width=84) (actual time=0.781..1.064 rows=100 loops=1)"
"              Output: parent.partition_id, parent.parent_id, parent.parent_data"
"              Buffers: shared hit=8 read=5"
"              ->  Merge Append  (cost=1.15..717941.88 rows=9994523 width=84) (actual time=0.773..1.025 rows=100 loops=1)"
"                    Sort Key: parent.parent_id"
"                    Buffers: shared hit=8 read=5"
"                    ->  Index Scan using parent_pkey on public.parent  (cost=0.12..8.14 rows=1 width=282) (actual time=0.004..0.004 rows=0 loops=1)"
"                          Output: parent.partition_id, parent.parent_id, parent.parent_data"
"                          Filter: (parent.partition_id = '3'::numeric)"
"                          Buffers: shared hit=1"
"                    ->  Index Scan using parent_3_pkey on public.parent_3  (cost=0.43..142873.20 rows=3332146 width=84) (actual time=0.008..0.023 rows=33 loops=1)"
"                          Output: parent_3.partition_id, parent_3.parent_id, parent_3.parent_data"
"                          Filter: (parent_3.partition_id = '3'::numeric)"
"                          Buffers: shared hit=4"
"                    ->  Index Scan using ix_child_parent on public.child  (cost=0.12..8.14 rows=1 width=282) (actual time=0.004..0.004 rows=0 loops=1)"
"                          Output: child.partition_id, child.parent_id, child.child_data"
"                          Filter: (child.partition_id = '3'::numeric)"
"                          Buffers: shared hit=1"
"                    ->  Index Scan using child_3_parent_id_idx on public.child_3  (cost=0.43..350175.59 rows=6662375 width=84) (actual time=0.755..0.953 rows=68 loops=1)"
"                          Output: child_3.partition_id, child_3.parent_id, child_3.child_data"
"                          Filter: (child_3.partition_id = '3'::numeric)"
"                          Buffers: shared hit=2 read=5"
"Planning time: 10.951 ms"
"Execution time: 1.945 ms"

pg_pathman:

"Limit  (cost=1.31..794053.33 rows=2 width=564) (actual time=0.926..1.108 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, parent_3.partition_id, parent_3.parent_id, parent_3.parent_data"
"  Buffers: shared hit=6 read=9"
...
"Planning time: 13.963 ms"
"Executiontime: 1.197 ms"

Полностью
"Limit  (cost=1.31..794053.33 rows=2 width=564) (actual time=0.926..1.108 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, parent_3.partition_id, parent_3.parent_id, parent_3.parent_data"
"  Buffers: shared hit=6 read=9"
"  ->  Merge Join  (cost=1.31..794053.33 rows=2 width=564) (actual time=0.925..1.097 rows=100 loops=1)"
"        Output: p.parent_id, p.parent_data, p.partition_id, parent_3.partition_id, parent_3.parent_id, parent_3.parent_data"
"        Merge Cond: (p.parent_id = parent_3.parent_id)"
"        Buffers: shared hit=6 read=9"
"        ->  Merge Append  (cost=0.44..142832.56 rows=3332146 width=84) (actual time=0.435..0.448 rows=32 loops=1)"
"              Sort Key: p.parent_id"
"              Buffers: shared read=4"
"              ->  Index Scan using parent_3_pkey on public.parent_3 p  (cost=0.43..142832.55 rows=3332146 width=84) (actual time=0.435..0.445 rows=32 loops=1)"
"                    Output: p.parent_id, p.parent_data, p.partition_id"
"                    Filter: (p.partition_id = '3'::numeric)"
"                    Buffers: shared read=4"
"        ->  Materialize  (cost=0.87..517958.89 rows=9994521 width=282) (actual time=0.482..0.609 rows=100 loops=1)"
"              Output: parent_3.partition_id, parent_3.parent_id, parent_3.parent_data"
"              Buffers: shared hit=6 read=5"
"              ->  Merge Append  (cost=0.87..492972.59 rows=9994521 width=282) (actual time=0.477..0.586 rows=100 loops=1)"
"                    Sort Key: parent_3.parent_id"
"                    Buffers: shared hit=6 read=5"
"                    ->  Index Scan using parent_3_pkey on public.parent_3  (cost=0.43..142832.55 rows=3332146 width=84) (actual time=0.008..0.017 rows=33 loops=1)"
"                          Output: parent_3.partition_id, parent_3.parent_id, parent_3.parent_data"
"                          Filter: (parent_3.partition_id = '3'::numeric)"
"                          Buffers: shared hit=4"
"                    ->  Index Scan using child_3_parent_id_idx on public.child_3  (cost=0.43..350140.00 rows=6662375 width=84) (actual time=0.469..0.549 rows=68 loops=1)"
"                          Output: child_3.partition_id, child_3.parent_id, child_3.child_data"
"                          Filter: (child_3.partition_id = '3'::numeric)"
"                          Buffers: shared hit=2 read=5"
"Planning time: 13.963 ms"
"Executiontime: 1.197 ms"

Две константы в условии на ключ партиционирования:

select *
from public.parent p,
(
  select partition_id, parent_id, parent_data from public.parent
  union all
  select partition_id, parent_id, child_data from public.child
) u
where u.partition_id = p.partition_id and u.parent_id = p.parent_id and p.partition_id in (2, 3)
limit 100;

postgres:

"Limit  (cost=3.97..2222.49 rows=100 width=168) (actual time=6.006..6.250 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, parent.partition_id, parent.parent_id, parent.parent_data"
"  Buffers: shared hit=11 read=24"
...
"Planning time: 15.399 ms"
"Execution time: 6.391 ms"

Полностью
"Limit  (cost=3.97..2222.49 rows=100 width=168) (actual time=6.006..6.250 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, parent.partition_id, parent.parent_id, parent.parent_data"
"  Buffers: shared hit=11 read=24"
"  ->  Merge Join  (cost=3.97..3327548.38 rows=149989 width=168) (actual time=6.006..6.235 rows=100 loops=1)"
"        Output: p.parent_id, p.parent_data, p.partition_id, parent.partition_id, parent.parent_id, parent.parent_data"
"        Merge Cond: (p.parent_id = parent.parent_id)"
"        Join Filter: (p.partition_id = parent.partition_id)"
"        Buffers: shared hit=11 read=24"
"        ->  Merge Append  (cost=1.01..408151.48 rows=6665413 width=84) (actual time=1.012..1.029 rows=35 loops=1)"
"              Sort Key: p.parent_id"
"              Buffers: shared hit=1 read=8"
"              ->  Index Scan using parent_pkey on public.parent p  (cost=0.12..8.14 rows=1 width=282) (actual time=0.003..0.003 rows=0 loops=1)"
"                    Output: p.parent_id, p.parent_data, p.partition_id"
"                    Filter: (p.partition_id = ANY ('{2,3}'::numeric[]))"
"                    Buffers: shared hit=1"
"              ->  Index Scan using parent_2_pkey on public.parent_2 p_1  (cost=0.43..142962.29 rows=3333266 width=84) (actual time=0.516..0.520 rows=18 loops=1)"
"                    Output: p_1.parent_id, p_1.parent_data, p_1.partition_id"
"                    Filter: (p_1.partition_id = ANY ('{2,3}'::numeric[]))"
"                    Buffers: shared read=4"
"              ->  Index Scan using parent_3_pkey on public.parent_3 p_2  (cost=0.43..142873.20 rows=3332146 width=84) (actual time=0.491..0.500 rows=18 loops=1)"
"                    Output: p_2.parent_id, p_2.parent_data, p_2.partition_id"
"                    Filter: (p_2.partition_id = ANY ('{2,3}'::numeric[]))"
"                    Buffers: shared read=4"
"        ->  Materialize  (cost=2.96..2452767.42 rows=29997730 width=84) (actual time=4.978..5.135 rows=173 loops=1)"
"              Output: parent.partition_id, parent.parent_id, parent.parent_data"
"              Buffers: shared hit=10 read=16"
"              ->  Merge Append  (cost=2.96..2377773.10 rows=29997730 width=84) (actual time=4.964..5.057 rows=173 loops=1)"
"                    Sort Key: parent.parent_id"
"                    Buffers: shared hit=10 read=16"
"                    ->  Index Scan using parent_pkey on public.parent  (cost=0.12..8.14 rows=1 width=282) (actual time=0.003..0.003 rows=0 loops=1)"
"                          Output: parent.partition_id, parent.parent_id, parent.parent_data"
"                          Buffers: shared hit=1"
"                    ->  Index Scan using parent_1_pkey on public.parent_1  (cost=0.43..134721.74 rows=3334606 width=84) (actual time=0.411..0.415 rows=25 loops=1)"
"                          Output: parent_1.partition_id, parent_1.parent_id, parent_1.parent_data"
"                          Buffers: shared read=4"
"                    ->  Index Scan using parent_2_pkey on public.parent_2  (cost=0.43..134629.12 rows=3333266 width=84) (actual time=0.005..0.006 rows=19 loops=1)"
"                          Output: parent_2.partition_id, parent_2.parent_id, parent_2.parent_data"
"                          Buffers: shared hit=4"
"                    ->  Index Scan using parent_3_pkey on public.parent_3  (cost=0.43..134542.83 rows=3332146 width=84) (actual time=0.005..0.010 rows=18 loops=1)"
"                          Output: parent_3.partition_id, parent_3.parent_id, parent_3.parent_data"
"                          Buffers: shared hit=4"
"                    ->  Index Scan using ix_child_parent on public.child  (cost=0.12..8.14 rows=1 width=282) (actual time=0.001..0.001 rows=0 loops=1)"
"                          Output: child.partition_id, child.parent_id, child.child_data"
"                          Buffers: shared hit=1"
"                    ->  Index Scan using child_1_parent_id_idx on public.child_1  (cost=0.43..331794.06 rows=6666670 width=84) (actual time=3.310..3.321 rows=50 loops=1)"
"                          Output: child_1.partition_id, child_1.parent_id, child_1.child_data"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_2_parent_id_idx on public.child_2  (cost=0.43..333623.06 rows=6668665 width=84) (actual time=0.553..0.553 rows=29 loops=1)"
"                          Output: child_2.partition_id, child_2.parent_id, child_2.child_data"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_3_parent_id_idx on public.child_3  (cost=0.43..333519.65 rows=6662375 width=84) (actual time=0.670..0.679 rows=37 loops=1)"
"                          Output: child_3.partition_id, child_3.parent_id, child_3.child_data"
"                          Buffers: shared read=4"
"Planning time: 15.399 ms"
"Execution time: 6.391 ms"

pg_pathman:

"Limit  (cost=3.54..2230075.76 rows=1 width=564) (actual time=3.679..4.028 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, parent_1.partition_id, parent_1.parent_id, parent_1.parent_data"
"  Buffers: shared hit=8 read=24"
...
"Planning time: 20.042 ms"
"Execution time: 4.162 ms"

Полностью
"Limit  (cost=3.54..2230075.76 rows=1 width=564) (actual time=3.679..4.028 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, parent_1.partition_id, parent_1.parent_id, parent_1.parent_data"
"  Buffers: shared hit=8 read=24"
"  ->  Merge Join  (cost=3.54..2230075.76 rows=1 width=564) (actual time=3.678..4.014 rows=100 loops=1)"
"        Output: p.parent_id, p.parent_data, p.partition_id, parent_1.partition_id, parent_1.parent_id, parent_1.parent_data"
"        Merge Cond: (p.parent_id = parent_1.parent_id)"
"        Join Filter: (p.partition_id = parent_1.partition_id)"
"        Buffers: shared hit=8 read=24"
"        ->  Merge Append  (cost=0.87..285722.16 rows=6665412 width=84) (actual time=1.279..1.299 rows=35 loops=1)"
"              Sort Key: p.parent_id"
"              Buffers: shared read=8"
"              ->  Index Scan using parent_2_pkey on public.parent_2 p  (cost=0.43..142889.60 rows=3333266 width=84) (actual time=0.492..0.498 rows=18 loops=1)"
"                    Output: p.parent_id, p.parent_data, p.partition_id"
"                    Filter: (p.partition_id = ANY ('{2,3}'::numeric[]))"
"                    Buffers: shared read=4"
"              ->  Index Scan using parent_3_pkey on public.parent_3 p_1  (cost=0.43..142832.55 rows=3332146 width=84) (actual time=0.786..0.793 rows=18 loops=1)"
"                    Output: p_1.parent_id, p_1.parent_data, p_1.partition_id"
"                    Filter: (p_1.partition_id = ANY ('{2,3}'::numeric[]))"
"                    Buffers: shared read=4"
"        ->  Materialize  (cost=2.67..1477724.15 rows=29997728 width=282) (actual time=2.381..2.618 rows=173 loops=1)"
"              Output: parent_1.partition_id, parent_1.parent_id, parent_1.parent_data"
"              Buffers: shared hit=8 read=16"
"              ->  Merge Append  (cost=2.67..1402729.83 rows=29997728 width=282) (actual time=2.371..2.575 rows=173 loops=1)"
"                    Sort Key: parent_1.parent_id"
"                    Buffers: shared hit=8 read=16"
"                    ->  Index Scan using parent_1_pkey on public.parent_1  (cost=0.43..134675.44 rows=3334606 width=84) (actual time=0.482..0.508 rows=25 loops=1)"
"                          Output: parent_1.partition_id, parent_1.parent_id, parent_1.parent_data"
"                          Buffers: shared read=4"
"                    ->  Index Scan using parent_2_pkey on public.parent_2  (cost=0.43..134599.76 rows=3333266 width=84) (actual time=0.007..0.016 rows=19 loops=1)"
"                          Output: parent_2.partition_id, parent_2.parent_id, parent_2.parent_data"
"                          Buffers: shared hit=4"
"                    ->  Index Scan using parent_3_pkey on public.parent_3  (cost=0.43..134526.41 rows=3332146 width=84) (actual time=0.006..0.016 rows=18 loops=1)"
"                          Output: parent_3.partition_id, parent_3.parent_id, parent_3.parent_data"
"                          Buffers: shared hit=4"
"                    ->  Index Scan using child_1_parent_id_idx on public.child_1  (cost=0.43..331789.90 rows=6666670 width=84) (actual time=0.473..0.483 rows=50 loops=1)"
"                          Output: child_1.partition_id, child_1.parent_id, child_1.child_data"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_2_parent_id_idx on public.child_2  (cost=0.43..333620.20 rows=6668665 width=84) (actual time=0.716..0.724 rows=29 loops=1)"
"                          Output: child_2.partition_id, child_2.parent_id, child_2.child_data"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_3_parent_id_idx on public.child_3  (cost=0.43..333517.99 rows=6662375 width=84) (actual time=0.679..0.691 rows=37 loops=1)"
"                          Output: child_3.partition_id, child_3.parent_id, child_3.child_data"
"                          Buffers: shared read=4"
"Planning time: 20.042 ms"
"Execution time: 4.162 ms"

Ключ партиционирования из таблицы:

select *
from public.result r,
(
  select partition_id, parent_id, parent_data from public.parent
  union all
  select partition_id, parent_id, child_data from public.child
) u
where u.partition_id = r.partition_id and u.parent_id = r.parent_id and r.result_id = 50
limit 100;

postgres:

"Limit  (cost=199.17..27826.77 rows=100 width=100) (actual time=10.168..39.515 rows=100 loops=1)"
"  Output: r.result_id, r.parent_id, r.partition_id, parent.partition_id, parent.parent_id, parent.parent_data"
"  Buffers: shared hit=393 read=271"
...
"Planning time: 14.982 ms"
"Execution time: 39.747 ms"

Полностью
"Limit  (cost=199.17..27826.77 rows=100 width=100) (actual time=10.168..39.515 rows=100 loops=1)"
"  Output: r.result_id, r.parent_id, r.partition_id, parent.partition_id, parent.parent_id, parent.parent_data"
"  Buffers: shared hit=393 read=271"
"  ->  Nested Loop  (cost=199.17..423453.93 rows=1532 width=100) (actual time=10.166..39.486 rows=100 loops=1)"
"        Output: r.result_id, r.parent_id, r.partition_id, parent.partition_id, parent.parent_id, parent.parent_data"
"        Buffers: shared hit=393 read=271"
"        ->  Bitmap Heap Scan on public.result r  (cost=199.17..7003.62 rows=9129 width=16) (actual time=9.257..10.275 rows=31 loops=1)"
"              Output: r.result_id, r.parent_id, r.partition_id"
"              Recheck Cond: (r.result_id = '50'::numeric)"
"              Heap Blocks: exact=14"
"              Buffers: shared read=52"
"              ->  Bitmap Index Scan on idx_result  (cost=0.00..196.89 rows=9129 width=0) (actual time=7.507..7.507 rows=10051 loops=1)"
"                    Index Cond: (r.result_id = '50'::numeric)"
"                    Buffers: shared read=38"
"        ->  Append  (cost=0.00..45.33 rows=29 width=84) (actual time=0.249..0.939 rows=3 loops=31)"
"              Buffers: shared hit=393 read=219"
"              ->  Seq Scan on public.parent  (cost=0.00..0.00 rows=1 width=282) (actual time=0.001..0.001 rows=0 loops=31)"
"                    Output: parent.partition_id, parent.parent_id, parent.parent_data"
"                    Filter: ((r.partition_id = parent.partition_id) AND (r.parent_id = parent.parent_id))"
"              ->  Index Scan using parent_1_pkey on public.parent_1  (cost=0.43..7.19 rows=1 width=84) (actual time=0.244..0.245 rows=1 loops=31)"
"                    Output: parent_1.partition_id, parent_1.parent_id, parent_1.parent_data"
"                    Index Cond: (parent_1.parent_id = r.parent_id)"
"                    Filter: (r.partition_id = parent_1.partition_id)"
"                    Buffers: shared hit=71 read=53"
"              ->  Index Scan using parent_2_pkey on public.parent_2  (cost=0.43..7.19 rows=1 width=84) (actual time=0.102..0.102 rows=0 loops=31)"
"                    Output: parent_2.partition_id, parent_2.parent_id, parent_2.parent_data"
"                    Index Cond: (parent_2.parent_id = r.parent_id)"
"                    Filter: (r.partition_id = parent_2.partition_id)"
"                    Buffers: shared hit=68 read=26"
"              ->  Index Scan using parent_3_pkey on public.parent_3  (cost=0.43..7.19 rows=1 width=84) (actual time=0.104..0.104 rows=0 loops=31)"
"                    Output: parent_3.partition_id, parent_3.parent_id, parent_3.parent_data"
"                    Index Cond: (parent_3.parent_id = r.parent_id)"
"                    Filter: (r.partition_id = parent_3.partition_id)"
"                    Buffers: shared hit=67 read=26"
"              ->  Seq Scan on public.child  (cost=0.00..0.00 rows=1 width=282) (actual time=0.000..0.000 rows=0 loops=31)"
"                    Output: child.partition_id, child.parent_id, child.child_data"
"                    Filter: ((r.partition_id = child.partition_id) AND (r.parent_id = child.parent_id))"
"              ->  Index Scan using child_1_parent_id_idx on public.child_1  (cost=0.43..7.92 rows=8 width=84) (actual time=0.194..0.194 rows=2 loops=31)"
"                    Output: child_1.partition_id, child_1.parent_id, child_1.child_data"
"                    Index Cond: (child_1.parent_id = r.parent_id)"
"                    Filter: (r.partition_id = child_1.partition_id)"
"                    Buffers: shared hit=64 read=56"
"              ->  Index Scan using child_2_parent_id_idx on public.child_2  (cost=0.43..7.92 rows=8 width=84) (actual time=0.144..0.144 rows=0 loops=30)"
"                    Output: child_2.partition_id, child_2.parent_id, child_2.child_data"
"                    Index Cond: (child_2.parent_id = r.parent_id)"
"                    Filter: (r.partition_id = child_2.partition_id)"
"                    Buffers: shared hit=62 read=28"
"              ->  Index Scan using child_3_parent_id_idx on public.child_3  (cost=0.43..7.92 rows=8 width=84) (actual time=0.145..0.145 rows=0 loops=30)"
"                    Output: child_3.partition_id, child_3.parent_id, child_3.child_data"
"                    Index Cond: (child_3.parent_id = r.parent_id)"
"                    Filter: (r.partition_id = child_3.partition_id)"
"                    Buffers: shared hit=61 read=30"
"Planning time: 14.982 ms"
"Execution time: 39.747 ms"

pg_pathman:

"Limit  (cost=199.60..411799.35 rows=1 width=298) (actual time=7.118..37.285 rows=100 loops=1)"
"  Output: r.result_id, r.parent_id, r.partition_id, parent_1.partition_id, parent_1.parent_id, parent_1.parent_data"
"  Buffers: shared hit=393 read=271"
...
"Planning time: 20.112 ms"
"Executiontime: 37.466 ms"

Полностью
"Limit  (cost=199.60..411799.35 rows=1 width=298) (actual time=7.118..37.285 rows=100 loops=1)"
"  Output: r.result_id, r.parent_id, r.partition_id, parent_1.partition_id, parent_1.parent_id, parent_1.parent_data"
"  Buffers: shared hit=393 read=271"
"  ->  Nested Loop  (cost=199.60..411799.35 rows=1 width=298) (actual time=7.115..37.259 rows=100 loops=1)"
"        Output: r.result_id, r.parent_id, r.partition_id, parent_1.partition_id, parent_1.parent_id, parent_1.parent_data"
"        Buffers: shared hit=393 read=271"
"        ->  Bitmap Heap Scan on public.result r  (cost=199.17..7003.62 rows=9129 width=16) (actual time=6.511..7.714 rows=31 loops=1)"
"              Output: r.result_id, r.parent_id, r.partition_id"
"              Recheck Cond: (r.result_id = '50'::numeric)"
"              Heap Blocks: exact=14"
"              Buffers: shared read=52"
"              ->  Bitmap Index Scan on idx_result  (cost=0.00..196.89 rows=9129 width=0) (actual time=5.642..5.642 rows=10051 loops=1)"
"                    Index Cond: (r.result_id = '50'::numeric)"
"                    Buffers: shared read=38"
"        ->  Append  (cost=0.43..44.07 rows=27 width=282) (actual time=0.209..0.950 rows=3 loops=31)"
"              Buffers: shared hit=393 read=219"
"              ->  Index Scan using parent_1_pkey on public.parent_1  (cost=0.43..6.77 rows=1 width=84) (actual time=0.207..0.208 rows=1 loops=31)"
"                    Output: parent_1.partition_id, parent_1.parent_id, parent_1.parent_data"
"                    Index Cond: (parent_1.parent_id = r.parent_id)"
"                    Filter: (r.partition_id = parent_1.partition_id)"
"                    Buffers: shared hit=71 read=53"
"              ->  Index Scan using parent_2_pkey on public.parent_2  (cost=0.43..6.77 rows=1 width=84) (actual time=0.102..0.102 rows=0 loops=31)"
"                    Output: parent_2.partition_id, parent_2.parent_id, parent_2.parent_data"
"                    Index Cond: (parent_2.parent_id = r.parent_id)"
"                    Filter: (r.partition_id = parent_2.partition_id)"
"                    Buffers: shared hit=68 read=26"
"              ->  Index Scan using parent_3_pkey on public.parent_3  (cost=0.43..6.77 rows=1 width=84) (actual time=0.121..0.121 rows=0 loops=31)"
"                    Output: parent_3.partition_id, parent_3.parent_id, parent_3.parent_data"
"                    Index Cond: (parent_3.parent_id = r.parent_id)"
"                    Filter: (r.partition_id = parent_3.partition_id)"
"                    Buffers: shared hit=67 read=26"
"              ->  Index Scan using child_1_parent_id_idx on public.child_1  (cost=0.43..7.92 rows=8 width=84) (actual time=0.260..0.261 rows=2 loops=31)"
"                    Output: child_1.partition_id, child_1.parent_id, child_1.child_data"
"                    Index Cond: (child_1.parent_id = r.parent_id)"
"                    Filter: (r.partition_id = child_1.partition_id)"
"                    Buffers: shared hit=64 read=56"
"              ->  Index Scan using child_2_parent_id_idx on public.child_2  (cost=0.43..7.92 rows=8 width=84) (actual time=0.125..0.125 rows=0 loops=30)"
"                    Output: child_2.partition_id, child_2.parent_id, child_2.child_data"
"                    Index Cond: (child_2.parent_id = r.parent_id)"
"                    Filter: (r.partition_id = child_2.partition_id)"
"                    Buffers: shared hit=62 read=28"
"              ->  Index Scan using child_3_parent_id_idx on public.child_3  (cost=0.43..7.92 rows=8 width=84) (actual time=0.129..0.129 rows=0 loops=30)"
"                    Output: child_3.partition_id, child_3.parent_id, child_3.child_data"
"                    Index Cond: (child_3.parent_id = r.parent_id)"
"                    Filter: (r.partition_id = child_3.partition_id)"
"                    Buffers: shared hit=61 read=30"
"Planning time: 20.112 ms"
"Executiontime: 37.466 ms"

В соединение с union подзапросом одна константа сработала и ограничила используемые партиции, pg_pathman лишь убрал из рассмотрения главные таблицы, не изменив структуру плана. В двухконстантном запросе константные значения подействовали лишь на таблицу parent и её партиции, подзапрос не был оптимизирован и без расширения, и с ним. В третьем запросе шаг RuntimeAppend не появился и разница со стандартным партиционированием незначительная. То есть тут есть ещё возможность задействования pg_pathman.

7.4. Запрос условием in


Одна константа в условии на ключ партиционирования:

select *
frompublic.parent p
where (p.partition_id, p.parent_id) in (select c.partition_id, c.parent_id from public.child c where c.child_index = 5000 and c.partition_id = 3)
limit 100;

postgres:

"Limit  (cost=273.21..858.33 rows=69 width=84) (actual time=5.816..30.351 rows=82 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id"
"  Buffers: shared hit=134 read=279"
...
"Planning time: 16.899 ms"
"Execution time: 30.710 ms"

Полностью
"Limit  (cost=273.21..858.33 rows=69 width=84) (actual time=5.816..30.351 rows=82 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id"
"  Buffers: shared hit=134 read=279"
"  ->  Nested Loop  (cost=273.21..858.33 rows=69 width=84) (actual time=5.807..30.321 rows=82 loops=1)"
"        Output: p.parent_id, p.parent_data, p.partition_id"
"        Buffers: shared hit=134 read=279"
"        ->  HashAggregate  (cost=273.21..273.90 rows=69 width=12) (actual time=5.228..5.276 rows=82 loops=1)"
"              Output: c.partition_id, c.parent_id"
"              Group Key: c.partition_id, c.parent_id"
"              Buffers: shared read=85"
"              ->  Append  (cost=0.00..272.87 rows=69 width=12) (actual time=1.185..5.143 rows=82 loops=1)"
"                    Buffers: shared read=85"
"                    ->  Seq Scan on public.child c  (cost=0.00..0.00 rows=1 width=64) (actual time=0.005..0.005 rows=0 loops=1)"
"                          Output: c.partition_id, c.parent_id"
"                          Filter: ((c.child_index = '5000'::numeric) AND (c.partition_id = '3'::numeric))"
"                    ->  Bitmap Heap Scan on public.child_3 c_1  (cost=4.96..272.87 rows=68 width=11) (actual time=1.180..5.118 rows=82 loops=1)"
"                          Output: c_1.partition_id, c_1.parent_id"
"                          Recheck Cond: (c_1.child_index = '5000'::numeric)"
"                          Filter: (c_1.partition_id = '3'::numeric)"
"                          Heap Blocks: exact=82"
"                          Buffers: shared read=85"
"                          ->  Bitmap Index Scan on child_3_child_index_idx  (cost=0.00..4.94 rows=68 width=0) (actual time=1.120..1.120 rows=82 loops=1)"
"                                Index Cond: (c_1.child_index = '5000'::numeric)"
"                                Buffers: shared read=3"
"        ->  Append  (cost=0.00..8.45 rows=2 width=84) (actual time=0.302..0.304 rows=1 loops=82)"
"              Buffers: shared hit=134 read=194"
"              ->  Seq Scan on public.parent p  (cost=0.00..0.00 rows=1 width=282) (actual time=0.000..0.000 rows=0 loops=82)"
"                    Output: p.parent_id, p.parent_data, p.partition_id"
"                    Filter: ((p.partition_id = '3'::numeric) AND (c.parent_id = p.parent_id))"
"              ->  Index Scan using parent_3_pkey on public.parent_3 p_1  (cost=0.43..8.45 rows=1 width=84) (actual time=0.299..0.300 rows=1 loops=82)"
"                    Output: p_1.parent_id, p_1.parent_data, p_1.partition_id"
"                    Index Cond: (p_1.parent_id = c.parent_id)"
"                    Filter: (p_1.partition_id = '3'::numeric)"
"                    Buffers: shared hit=134 read=194"
"Planning time: 16.899 ms"
"Execution time: 30.710 ms"

pg_pathman:

"Limit  (cost=273.64..849.17 rows=68 width=282) (actual time=11.865..34.016 rows=82 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id"
"  Buffers: shared hit=134 read=279"
...
"Planning time: 21.265 ms"
"Executiontime: 34.098 ms"

Полностью
"Limit  (cost=273.64..849.17 rows=68 width=282) (actual time=11.865..34.016 rows=82 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id"
"  Buffers: shared hit=134 read=279"
"  ->  Nested Loop  (cost=273.64..849.17 rows=68 width=282) (actual time=11.862..33.985 rows=82 loops=1)"
"        Output: p.parent_id, p.parent_data, p.partition_id"
"        Buffers: shared hit=134 read=279"
"        ->  HashAggregate  (cost=273.21..273.89 rows=68 width=11) (actual time=11.323..11.364 rows=82 loops=1)"
"              Output: c.partition_id, c.parent_id"
"              Group Key: c.partition_id, c.parent_id"
"              Buffers: shared read=85"
"              ->  Append  (cost=4.96..272.87 rows=68 width=11) (actual time=1.940..11.150 rows=82 loops=1)"
"                    Buffers: shared read=85"
"                    ->  Bitmap Heap Scan on public.child_3 c  (cost=4.96..272.87 rows=68 width=11) (actual time=1.938..11.132 rows=82 loops=1)"
"                          Output: c.partition_id, c.parent_id"
"                          Recheck Cond: (c.child_index = '5000'::numeric)"
"                          Filter: (c.partition_id = '3'::numeric)"
"                          Heap Blocks: exact=82"
"                          Buffers: shared read=85"
"                          ->  Bitmap Index Scan on child_3_child_index_idx  (cost=0.00..4.94 rows=68 width=0) (actual time=1.655..1.655 rows=82 loops=1)"
"                                Index Cond: (c.child_index = '5000'::numeric)"
"                                Buffers: shared read=3"
"        ->  Append  (cost=0.43..8.45 rows=1 width=84) (actual time=0.274..0.275 rows=1 loops=82)"
"              Buffers: shared hit=134 read=194"
"              ->  Index Scan using parent_3_pkey on public.parent_3 p  (cost=0.43..8.45 rows=1 width=84) (actual time=0.273..0.274 rows=1 loops=82)"
"                    Output: p.parent_id, p.parent_data, p.partition_id"
"                    Index Cond: (p.parent_id = c.parent_id)"
"                    Filter: (p.partition_id = '3'::numeric)"
"                    Buffers: shared hit=134 read=194"
"Planning time: 21.265 ms"
"Executiontime: 34.098 ms"

Две константы в условии на ключ партиционирования:

select *
from public.parent p
where (p.partition_id, p.parent_id) in (select c.partition_id, c.parent_id from public.child c where c.child_index = 5000 and c.partition_id in (2, 3))
limit 100;

postgres:

"Limit  (cost=538.59..3078.59 rows=100 width=84) (actual time=20.449..90.114 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id"
"  Buffers: shared hit=510 read=642"
...
"Planning time: 14.024 ms"
"Execution time: 90.484 ms"

Полностью
"Limit  (cost=538.59..3078.59 rows=100 width=84) (actual time=20.449..90.114 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id"
"  Buffers: shared hit=510 read=642"
"  ->  Nested Loop  (cost=538.59..3967.59 rows=135 width=84) (actual time=20.446..90.045 rows=100 loops=1)"
"        Output: p.parent_id, p.parent_data, p.partition_id"
"        Buffers: shared hit=510 read=642"
"        ->  HashAggregate  (cost=538.59..539.94 rows=135 width=11) (actual time=19.387..19.486 rows=100 loops=1)"
"              Output: c.partition_id, c.parent_id"
"              Group Key: c.partition_id, c.parent_id"
"              Buffers: shared read=155"
"              ->  Append  (cost=0.00..537.91 rows=135 width=11) (actual time=1.090..19.200 rows=149 loops=1)"
"                    Buffers: shared read=155"
"                    ->  Seq Scan on public.child c  (cost=0.00..0.00 rows=1 width=64) (actual time=0.005..0.005 rows=0 loops=1)"
"                          Output: c.partition_id, c.parent_id"
"                          Filter: ((c.partition_id = ANY ('{2,3}'::numeric[])) AND (c.child_index = '5000'::numeric))"
"                    ->  Bitmap Heap Scan on public.child_2 c_1  (cost=4.94..265.05 rows=66 width=11) (actual time=1.083..5.456 rows=67 loops=1)"
"                          Output: c_1.partition_id, c_1.parent_id"
"                          Recheck Cond: (c_1.child_index = '5000'::numeric)"
"                          Filter: (c_1.partition_id = ANY ('{2,3}'::numeric[]))"
"                          Heap Blocks: exact=67"
"                          Buffers: shared read=70"
"                          ->  Bitmap Index Scan on child_2_child_index_idx  (cost=0.00..4.93 rows=66 width=0) (actual time=1.003..1.003 rows=67 loops=1)"
"                                Index Cond: (c_1.child_index = '5000'::numeric)"
"                                Buffers: shared read=3"
"                    ->  Bitmap Heap Scan on public.child_3 c_2  (cost=4.96..272.87 rows=68 width=11) (actual time=1.271..13.668 rows=82 loops=1)"
"                          Output: c_2.partition_id, c_2.parent_id"
"                          Recheck Cond: (c_2.child_index = '5000'::numeric)"
"                          Filter: (c_2.partition_id = ANY ('{2,3}'::numeric[]))"
"                          Heap Blocks: exact=82"
"                          Buffers: shared read=85"
"                          ->  Bitmap Index Scan on child_3_child_index_idx  (cost=0.00..4.94 rows=68 width=0) (actual time=0.781..0.781 rows=82 loops=1)"
"                                Index Cond: (c_2.child_index = '5000'::numeric)"
"                                Buffers: shared read=3"
"        ->  Append  (cost=0.00..25.35 rows=4 width=84) (actual time=0.615..0.704 rows=1 loops=100)"
"              Buffers: shared hit=510 read=487"
"              ->  Seq Scan on public.parent p  (cost=0.00..0.00 rows=1 width=282) (actual time=0.000..0.000 rows=0 loops=100)"
"                    Output: p.parent_id, p.parent_data, p.partition_id"
"                    Filter: ((c.partition_id = p.partition_id) AND (c.parent_id = p.parent_id))"
"              ->  Index Scan using parent_1_pkey on public.parent_1 p_1  (cost=0.43..8.45 rows=1 width=84) (actual time=0.193..0.193 rows=0 loops=100)"
"                    Output: p_1.parent_id, p_1.parent_data, p_1.partition_id"
"                    Index Cond: (p_1.parent_id = c.parent_id)"
"                    Filter: (c.partition_id = p_1.partition_id)"
"                    Buffers: shared hit=171 read=129"
"              ->  Index Scan using parent_2_pkey on public.parent_2 p_2  (cost=0.43..8.45 rows=1 width=84) (actual time=0.225..0.225 rows=0 loops=100)"
"                    Output: p_2.parent_id, p_2.parent_data, p_2.partition_id"
"                    Index Cond: (p_2.parent_id = c.parent_id)"
"                    Filter: (c.partition_id = p_2.partition_id)"
"                    Buffers: shared hit=170 read=172"
"              ->  Index Scan using parent_3_pkey on public.parent_3 p_3  (cost=0.43..8.45 rows=1 width=84) (actual time=0.280..0.280 rows=1 loops=99)"
"                    Output: p_3.parent_id, p_3.parent_data, p_3.partition_id"
"                    Index Cond: (p_3.parent_id = c.parent_id)"
"                    Filter: (c.partition_id = p_3.partition_id)"
"                    Buffers: shared hit=169 read=186"
"Planning time: 14.024 ms"
"Execution time: 90.484 ms"

pg_pathman:

"Limit  (cost=539.01..1385.69 rows=100 width=282) (actual time=17.913..48.317 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id"
"  Buffers: shared hit=152 read=403"
...
"Planning time: 14.037 ms"
"Execution time: 48.413 ms"

Полностью
"Limit  (cost=539.01..1385.69 rows=100 width=282) (actual time=17.913..48.317 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id"
"  Buffers: shared hit=152 read=403"
"  ->  Nested Loop  (cost=539.01..1673.56 rows=134 width=282) (actual time=17.911..48.286 rows=100 loops=1)"
"        Output: p.parent_id, p.parent_data, p.partition_id"
"        Buffers: shared hit=152 read=403"
"        ->  HashAggregate  (cost=538.58..539.92 rows=134 width=11) (actual time=17.359..17.411 rows=100 loops=1)"
"              Output: c.partition_id, c.parent_id"
"              Group Key: c.partition_id, c.parent_id"
"              Buffers: shared read=155"
"              ->  Append  (cost=4.94..537.91 rows=134 width=11) (actual time=0.503..17.149 rows=149 loops=1)"
"                    Buffers: shared read=155"
"                    ->  Bitmap Heap Scan on public.child_2 c  (cost=4.94..265.05 rows=66 width=11) (actual time=0.503..7.763 rows=67 loops=1)"
"                          Output: c.partition_id, c.parent_id"
"                          Recheck Cond: (c.child_index = '5000'::numeric)"
"                          Filter: (c.partition_id = ANY ('{2,3}'::numeric[]))"
"                          Heap Blocks: exact=67"
"                          Buffers: shared read=70"
"                          ->  Bitmap Index Scan on child_2_child_index_idx  (cost=0.00..4.93 rows=66 width=0) (actual time=0.392..0.392 rows=67 loops=1)"
"                                Index Cond: (c.child_index = '5000'::numeric)"
"                                Buffers: shared read=3"
"                    ->  Bitmap Heap Scan on public.child_3 c_1  (cost=4.96..272.87 rows=68 width=11) (actual time=0.591..9.346 rows=82 loops=1)"
"                          Output: c_1.partition_id, c_1.parent_id"
"                          Recheck Cond: (c_1.child_index = '5000'::numeric)"
"                          Filter: (c_1.partition_id = ANY ('{2,3}'::numeric[]))"
"                          Heap Blocks: exact=82"
"                          Buffers: shared read=85"
"                          ->  Bitmap Index Scan on child_3_child_index_idx  (cost=0.00..4.94 rows=68 width=0) (actual time=0.357..0.357 rows=82 loops=1)"
"                                Index Cond: (c_1.child_index = '5000'::numeric)"
"                                Buffers: shared read=3"
"        ->  Custom Scan (RuntimeAppend)  (cost=0.43..8.45 rows=1 width=84) (actual time=0.299..0.300 rows=1 loops=100)"
"              Output: p.parent_id, p.parent_data, p.partition_id"
"              Prune by: (c.partition_id = p.partition_id)"
"              Buffers: shared hit=152 read=248"
"              ->  Index Scan using parent_2_pkey on public.parent_2 p_1  (cost=0.43..8.45 rows=1 width=84) (actual time=0.293..0.293 rows=1 loops=42)"
"                    Output: p_1.parent_id, p_1.parent_data, p_1.partition_id"
"                    Index Cond: (p_1.parent_id = c.parent_id)"
"                    Filter: (c.partition_id = p_1.partition_id)"
"                    Buffers: shared hit=62 read=106"
"              ->  Index Scan using parent_3_pkey on public.parent_3 p_2  (cost=0.43..8.45 rows=1 width=84) (actual time=0.301..0.302 rows=1 loops=58)"
"                    Output: p_2.parent_id, p_2.parent_data, p_2.partition_id"
"                    Index Cond: (p_2.parent_id = c.parent_id)"
"                    Filter: (c.partition_id = p_2.partition_id)"
"                    Buffers: shared hit=90 read=142"
"Planning time: 14.037 ms"
"Execution time: 48.413 ms"

Ключ партиционирования из таблицы:

select *
from public.parent p
where (p.partition_id, p.parent_id) in (select r.partition_id, r.parent_id from public.result r where r.result_id = 50)
limit 100;

postgres:

"Limit  (cost=7049.27..9084.40 rows=100 width=84) (actual time=593.094..692.189 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id"
"  Buffers: shared hit=506 read=5583"
...
"Planning time: 8.554 ms"
"Execution time: 692.468 ms"

Полностью
"Limit  (cost=7049.27..9084.40 rows=100 width=84) (actual time=593.094..692.189 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id"
"  Buffers: shared hit=506 read=5583"
"  ->  Nested Loop  (cost=7049.27..192836.06 rows=9129 width=84) (actual time=593.091..692.134 rows=100 loops=1)"
"        Output: p.parent_id, p.parent_data, p.partition_id"
"        Buffers: shared hit=506 read=5583"
"        ->  HashAggregate  (cost=7049.27..7140.51 rows=9124 width=11) (actual time=592.396..592.519 rows=100 loops=1)"
"              Output: r.partition_id, r.parent_id"
"              Group Key: r.partition_id, r.parent_id"
"              Buffers: shared read=5094"
"              ->  Bitmap Heap Scan on public.result r  (cost=199.17..7003.62 rows=9129 width=11) (actual time=10.049..576.423 rows=10051 loops=1)"
"                    Output: r.result_id, r.parent_id, r.partition_id"
"                    Recheck Cond: (r.result_id = '50'::numeric)"
"                    Heap Blocks: exact=5056"
"                    Buffers: shared read=5094"
"                    ->  Bitmap Index Scan on idx_result  (cost=0.00..196.89 rows=9129 width=0) (actual time=9.087..9.087 rows=10051 loops=1)"
"                          Index Cond: (r.result_id = '50'::numeric)"
"                          Buffers: shared read=38"
"        ->  Append  (cost=0.00..20.31 rows=4 width=84) (actual time=0.733..0.994 rows=1 loops=100)"
"              Buffers: shared hit=506 read=489"
"              ->  Seq Scan on public.parent p  (cost=0.00..0.00 rows=1 width=282) (actual time=0.000..0.000 rows=0 loops=100)"
"                    Output: p.parent_id, p.parent_data, p.partition_id"
"                    Filter: ((r.partition_id = p.partition_id) AND (r.parent_id = p.parent_id))"
"              ->  Index Scan using parent_1_pkey on public.parent_1 p_1  (cost=0.43..6.77 rows=1 width=84) (actual time=0.315..0.316 rows=0 loops=100)"
"                    Output: p_1.parent_id, p_1.parent_data, p_1.partition_id"
"                    Index Cond: (p_1.parent_id = r.parent_id)"
"                    Filter: (r.partition_id = p_1.partition_id)"
"                    Buffers: shared hit=170 read=158"
"              ->  Index Scan using parent_2_pkey on public.parent_2 p_2  (cost=0.43..6.77 rows=1 width=84) (actual time=0.355..0.358 rows=0 loops=99)"
"                    Output: p_2.parent_id, p_2.parent_data, p_2.partition_id"
"                    Index Cond: (p_2.parent_id = r.parent_id)"
"                    Filter: (r.partition_id = p_2.partition_id)"
"                    Buffers: shared hit=168 read=170"
"              ->  Index Scan using parent_3_pkey on public.parent_3 p_3  (cost=0.43..6.77 rows=1 width=84) (actual time=0.318..0.319 rows=0 loops=99)"
"                    Output: p_3.parent_id, p_3.parent_data, p_3.partition_id"
"                    Index Cond: (p_3.parent_id = r.parent_id)"
"                    Filter: (r.partition_id = p_3.partition_id)"
"                    Buffers: shared hit=168 read=161"
"Planning time: 8.554 ms"
"Execution time: 692.468 ms"

pg_pathman:

"Limit  (cost=7049.70..7728.77 rows=100 width=282) (actual time=531.946..564.520 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id"
"  Buffers: shared hit=137 read=5358"
...
"Planning time: 8.863 ms"
"Executiontime: 564.741 ms"

Полностью
"Limit  (cost=7049.70..7728.77 rows=100 width=282) (actual time=531.946..564.520 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id"
"  Buffers: shared hit=137 read=5358"
"  ->  Nested Loop  (cost=7049.70..69008.61 rows=9124 width=282) (actual time=531.943..564.483 rows=100 loops=1)"
"        Output: p.parent_id, p.parent_data, p.partition_id"
"        Buffers: shared hit=137 read=5358"
"        ->  HashAggregate  (cost=7049.27..7140.51 rows=9124 width=11) (actual time=531.407..531.479 rows=100 loops=1)"
"              Output: r.partition_id, r.parent_id"
"              Group Key: r.partition_id, r.parent_id"
"              Buffers: shared read=5094"
"              ->  Bitmap Heap Scan on public.result r  (cost=199.17..7003.62 rows=9129 width=11) (actual time=11.477..518.114 rows=10051 loops=1)"
"                    Output: r.result_id, r.parent_id, r.partition_id"
"                    Recheck Cond: (r.result_id = '50'::numeric)"
"                    Heap Blocks: exact=5056"
"                    Buffers: shared read=5094"
"                    ->  Bitmap Index Scan on idx_result  (cost=0.00..196.89 rows=9129 width=0) (actual time=10.655..10.655 rows=10051 loops=1)"
"                          Index Cond: (r.result_id = '50'::numeric)"
"                          Buffers: shared read=38"
"        ->  Custom Scan (RuntimeAppend)  (cost=0.43..6.77 rows=1 width=84) (actual time=0.320..0.322 rows=1 loops=100)"
"              Output: p.parent_id, p.parent_data, p.partition_id"
"              Prune by: (r.partition_id = p.partition_id)"
"              Buffers: shared hit=137 read=264"
"              ->  Index Scan using parent_1_pkey on public.parent_1 p_1  (cost=0.43..6.77 rows=1 width=84) (actual time=0.329..0.330 rows=1 loops=28)"
"                    Output: p_1.parent_id, p_1.parent_data, p_1.partition_id"
"                    Index Cond: (p_1.parent_id = r.parent_id)"
"                    Filter: (r.partition_id = p_1.partition_id)"
"                    Buffers: shared hit=37 read=75"
"              ->  Index Scan using parent_2_pkey on public.parent_2 p_2  (cost=0.43..6.77 rows=1 width=84) (actual time=0.313..0.317 rows=1 loops=40)"
"                    Output: p_2.parent_id, p_2.parent_data, p_2.partition_id"
"                    Index Cond: (p_2.parent_id = r.parent_id)"
"                    Filter: (r.partition_id = p_2.partition_id)"
"                    Buffers: shared hit=56 read=105"
"              ->  Index Scan using parent_3_pkey on public.parent_3 p_3  (cost=0.43..6.77 rows=1 width=84) (actual time=0.317..0.317 rows=1 loops=32)"
"                    Output: p_3.parent_id, p_3.parent_data, p_3.partition_id"
"                    Index Cond: (p_3.parent_id = r.parent_id)"
"                    Filter: (r.partition_id = p_3.partition_id)"
"                    Buffers: shared hit=44 read=84"
"Planning time: 8.863 ms"
"Executiontime: 564.741 ms"

Первый запрос не принёс неожиданностей, все, как и у ранее рассмотренных. А вот во втором сработало расширение, которое сократило количество рассмотренных блоков почти в два раза. Тут можно увидеть классический случай partition pruning: в шаге RuntimeAppend пробег по двум партициям состоялся ровно сто раз (42 + 58), то есть поиск осуществлялся строго по тем партициям, которые были нужны. В третьем же запросе отсечение срабатывает корректно, но количество чтений не сокращается в разы из-за того, что в таблице результатов на конкретное значение приходится слишком много результатов. Их сокращение до требуемых ста производится уже после. С другой стороны, для нормальной БД будет всё хорошо, обычные таблицы там не должны быть большими (иначе их партиционируют), а отсечение ненужных партиций на больших таблицах даст существенное сокращение просмотренных блоков.

Запрос с exists условием сработал полностью аналогично, вот его пример:

select *
from public.parent p
where exists (select 1 from public.child c where p.partition_id = c.partition_id and p.parent_id = c.parent_id and c.child_index = 5000 and c.partition_id in (2, 3))
limit 100;

7.5. Запрос по 30-ти партициям.


Меня очень огорчило, что RuntimeAppend часто не срабатывает в случае с двумя константами в условии на ключ партиционирования и я решил проверить запрос на большем количестве партиций. Была создана схема с теми же таблицами и 30-тью партициями в каждой с расширением pg_pathman. Туда были закачены данные и по методике протестирован следующий запрос из 7.1:


select *
from public.parent p, public.child c
where c.partition_id = p.partition_id and c.parent_id = p.parent_id and p.partition_id in (2, 3)
limit 100;

Его план:

"Limit  (cost=17.38..294.92 rows=100 width=628) (actual time=31.483..33.538 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"  Buffers: shared hit=2 read=131"
...
"Planning time: 48.911 ms"
"Executiontime: 34.143 ms"

Полностью
"Limit  (cost=17.38..294.92 rows=100 width=628) (actual time=31.483..33.538 rows=100 loops=1)"
"  Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"  Buffers: shared hit=2 read=131"
"  ->  Merge Join  (cost=17.38..1318082.71 rows=474913 width=628) (actual time=31.482..33.514 rows=100 loops=1)"
"        Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"        Merge Cond: (p.parent_id = c.parent_id)"
"        Join Filter: (p.partition_id = c.partition_id)"
"        Buffers: shared hit=2 read=131"
"        ->  Merge Append  (cost=0.85..28624.08 rows=666900 width=84) (actual time=2.625..2.651 rows=48 loops=1)"
"              Sort Key: p.parent_id"
"              Buffers: shared read=8"
"              ->  Index Scan using parent_2_pkey on public.parent_2 p  (cost=0.42..14289.87 rows=332887 width=84) (actual time=1.362..1.370 rows=28 loops=1)"
"                    Output: p.parent_id, p.parent_data, p.partition_id"
"                    Filter: (p.partition_id = ANY ('{2,3}'::numeric[]))"
"                    Buffers: shared read=4"
"              ->  Index Scan using parent_3_pkey on public.parent_3 p_1  (cost=0.42..14334.21 rows=334013 width=84) (actual time=1.262..1.272 rows=21 loops=1)"
"                    Output: p_1.parent_id, p_1.parent_data, p_1.partition_id"
"                    Filter: (p_1.partition_id = ANY ('{2,3}'::numeric[]))"
"                    Buffers: shared read=4"
"        ->  Materialize  (cost=13.49..1059692.47 rows=20004382 width=98) (actual time=28.789..30.453 rows=1369 loops=1)"
"              Output: c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"              Buffers: shared hit=2 read=123"
"              ->  Merge Append  (cost=13.49..1009681.51 rows=20004382 width=98) (actual time=28.782..30.170 rows=1369 loops=1)"
"                    Sort Key: c.parent_id"
"                    Buffers: shared hit=2 read=123"
"                    ->  Index Scan using child_1_parent_id_idx on public.child_1 c  (cost=0.42..32766.35 rows=664299 width=97) (actual time=0.895..0.909 rows=47 loops=1)"
"                          Output: c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_2_parent_id_idx on public.child_2 c_1  (cost=0.42..33032.19 rows=664696 width=97) (actual time=0.835..1.048 rows=62 loops=1)"
"                          Output: c_1.child_id, c_1.parent_id, c_1.child_data, c_1.child_index, c_1.partition_id"
"                          Buffers: shared read=5"
"                    ->  Index Scan using child_3_parent_id_idx on public.child_3 c_2  (cost=0.42..33625.31 rows=668577 width=98) (actual time=0.849..0.858 rows=39 loops=1)"
"                          Output: c_2.child_id, c_2.parent_id, c_2.child_data, c_2.child_index, c_2.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_4_parent_id_idx on public.child_4 c_3  (cost=0.42..32859.69 rows=668831 width=98) (actual time=0.746..0.772 rows=58 loops=1)"
"                          Output: c_3.child_id, c_3.parent_id, c_3.child_data, c_3.child_index, c_3.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_5_parent_id_idx on public.child_5 c_4  (cost=0.42..34070.24 rows=667684 width=98) (actual time=1.037..1.043 rows=44 loops=1)"
"                          Output: c_4.child_id, c_4.parent_id, c_4.child_data, c_4.child_index, c_4.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_6_parent_id_idx on public.child_6 c_5  (cost=0.42..34413.73 rows=668521 width=98) (actual time=1.248..1.255 rows=63 loops=1)"
"                          Output: c_5.child_id, c_5.parent_id, c_5.child_data, c_5.child_index, c_5.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_7_parent_id_idx on public.child_7 c_6  (cost=0.42..33858.17 rows=664992 width=97) (actual time=1.084..1.089 rows=31 loops=1)"
"                          Output: c_6.child_id, c_6.parent_id, c_6.child_data, c_6.child_index, c_6.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_8_parent_id_idx on public.child_8 c_7  (cost=0.42..33396.85 rows=666651 width=98) (actual time=0.674..0.804 rows=64 loops=1)"
"                          Output: c_7.child_id, c_7.parent_id, c_7.child_data, c_7.child_index, c_7.partition_id"
"                          Buffers: shared read=5"
"                    ->  Index Scan using child_9_parent_id_idx on public.child_9 c_8  (cost=0.42..33535.73 rows=665594 width=98) (actual time=1.142..1.148 rows=53 loops=1)"
"                          Output: c_8.child_id, c_8.parent_id, c_8.child_data, c_8.child_index, c_8.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_10_parent_id_idx on public.child_10 c_9  (cost=0.42..34376.76 rows=667885 width=97) (actual time=1.112..1.117 rows=36 loops=1)"
"                          Output: c_9.child_id, c_9.parent_id, c_9.child_data, c_9.child_index, c_9.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_11_parent_id_idx on public.child_11 c_10  (cost=0.42..34520.23 rows=667304 width=98) (actual time=0.875..0.883 rows=43 loops=1)"
"                          Output: c_10.child_id, c_10.parent_id, c_10.child_data, c_10.child_index, c_10.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_12_parent_id_idx on public.child_12 c_11  (cost=0.42..34516.73 rows=667192 width=97) (actual time=0.725..0.731 rows=37 loops=1)"
"                          Output: c_11.child_id, c_11.parent_id, c_11.child_data, c_11.child_index, c_11.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_13_parent_id_idx on public.child_13 c_12  (cost=0.42..34775.47 rows=665684 width=98) (actual time=0.799..0.807 rows=46 loops=1)"
"                          Output: c_12.child_id, c_12.parent_id, c_12.child_data, c_12.child_index, c_12.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_14_parent_id_idx on public.child_14 c_13  (cost=0.42..32759.51 rows=667829 width=97) (actual time=0.952..0.966 rows=51 loops=1)"
"                          Output: c_13.child_id, c_13.parent_id, c_13.child_data, c_13.child_index, c_13.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_15_parent_id_idx on public.child_15 c_14  (cost=0.42..33834.34 rows=666471 width=97) (actual time=0.713..0.719 rows=30 loops=1)"
"                          Output: c_14.child_id, c_14.parent_id, c_14.child_data, c_14.child_index, c_14.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_16_parent_id_idx on public.child_16 c_15  (cost=0.42..33276.75 rows=665648 width=98) (actual time=0.887..0.892 rows=40 loops=1)"
"                          Output: c_15.child_id, c_15.parent_id, c_15.child_data, c_15.child_index, c_15.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_17_parent_id_idx on public.child_17 c_16  (cost=0.42..34426.11 rows=666740 width=98) (actual time=0.820..0.835 rows=50 loops=1)"
"                          Output: c_16.child_id, c_16.parent_id, c_16.child_data, c_16.child_index, c_16.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_18_parent_id_idx on public.child_18 c_17  (cost=0.42..31990.71 rows=665008 width=98) (actual time=0.780..0.794 rows=52 loops=1)"
"                          Output: c_17.child_id, c_17.parent_id, c_17.child_data, c_17.child_index, c_17.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_19_parent_id_idx on public.child_19 c_18  (cost=0.42..34141.36 rows=669231 width=97) (actual time=0.782..0.792 rows=42 loops=1)"
"                          Output: c_18.child_id, c_18.parent_id, c_18.child_data, c_18.child_index, c_18.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_20_parent_id_idx on public.child_20 c_19  (cost=0.42..33065.75 rows=666164 width=98) (actual time=1.134..1.143 rows=44 loops=1)"
"                          Output: c_19.child_id, c_19.parent_id, c_19.child_data, c_19.child_index, c_19.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_21_parent_id_idx on public.child_21 c_20  (cost=0.42..34236.36 rows=667822 width=98) (actual time=1.122..1.131 rows=53 loops=1)"
"                          Output: c_20.child_id, c_20.parent_id, c_20.child_data, c_20.child_index, c_20.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_22_parent_id_idx on public.child_22 c_21  (cost=0.42..33809.84 rows=664828 width=97) (actual time=1.334..1.344 rows=52 loops=1)"
"                          Output: c_21.child_id, c_21.parent_id, c_21.child_data, c_21.child_index, c_21.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_23_parent_id_idx on public.child_23 c_22  (cost=0.42..34169.35 rows=668321 width=98) (actual time=1.478..1.482 rows=30 loops=1)"
"                          Output: c_22.child_id, c_22.parent_id, c_22.child_data, c_22.child_index, c_22.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_24_parent_id_idx on public.child_24 c_23  (cost=0.42..31693.94 rows=669014 width=98) (actual time=0.936..0.940 rows=30 loops=1)"
"                          Output: c_23.child_id, c_23.parent_id, c_23.child_data, c_23.child_index, c_23.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_25_parent_id_idx on public.child_25 c_24  (cost=0.42..34224.95 rows=666004 width=98) (actual time=0.869..0.881 rows=60 loops=1)"
"                          Output: c_24.child_id, c_24.parent_id, c_24.child_data, c_24.child_index, c_24.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_26_parent_id_idx on public.child_26 c_25  (cost=0.42..32132.54 rows=665401 width=97) (actual time=1.129..1.138 rows=53 loops=1)"
"                          Output: c_25.child_id, c_25.parent_id, c_25.child_data, c_25.child_index, c_25.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_27_parent_id_idx on public.child_27 c_26  (cost=0.42..34930.21 rows=668930 width=97) (actual time=1.307..1.312 rows=42 loops=1)"
"                          Output: c_26.child_id, c_26.parent_id, c_26.child_data, c_26.child_index, c_26.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_28_parent_id_idx on public.child_28 c_27  (cost=0.42..33712.49 rows=666743 width=98) (actual time=0.782..0.793 rows=39 loops=1)"
"                          Output: c_27.child_id, c_27.parent_id, c_27.child_data, c_27.child_index, c_27.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_29_parent_id_idx on public.child_29 c_28  (cost=0.42..34281.07 rows=665822 width=98) (actual time=0.879..0.886 rows=42 loops=1)"
"                          Output: c_28.child_id, c_28.parent_id, c_28.child_data, c_28.child_index, c_28.partition_id"
"                          Buffers: shared read=4"
"                    ->  Index Scan using child_30_parent_id_idx on public.child_30 c_29  (cost=0.42..33248.03 rows=666496 width=98) (actual time=0.816..0.934 rows=65 loops=1)"
"                          Output: c_29.child_id, c_29.parent_id, c_29.child_data, c_29.child_index, c_29.partition_id"
"                          Buffers: shared hit=2 read=5"
"Planning time: 48.911 ms"
"Executiontime: 34.143 ms"

Штатное отсечение сработало на parent, на таблице child к сожалению не сработало расширение pg_pathman, что с моей точки зрения неправильно.

8. Выводы


1) Хотя штатное партиционирование таблиц довольно куцее, но оно может отсекать партиции для запросов с одним константным значением ключа партиционирования, но не для подзапросов в select. Для двух и более констант ключа партиционирования в запросе отсечение срабатывает лишь для таблицы, на которую поставлено это условие. То есть это партиционирование годится лишь для простых случаев партиционирования, либо для сложных, но тогда придётся сильно оптимизировать запросы, способы их выполнения, разбивать их на части, то есть сильно менять взаимодействие с бд.

2) Вставка с расширением pg_pathman ускоряется примерно в два — три раза. При увеличении количества партиций, я думаю, ситуация не изменится.

3) Запросы в рассмотренных примерах за счёт динамического отсечения партиций уменьшают количество потребных блоков в два раза, но при увеличении количества партиций до обычного для большой бд (100-1000)разница в потребных ресурсах будет сильно больше.

4) Расширение pg_pathman, когда срабатывает шаг RuntimeAppend, серьёзно ускоряет запросы и вставку и использовать его для партиционирования таблиц НАДО.

5) Но, существует проблема с тем, что шаг RuntimeAppend не всегда срабатывает и запросы работают, как и раньше. То есть пока существуют проблемы с планированием запросов. По опыту на оракле данный шаг всегда уместен, когда есть информация о необходимой партиции. Но тут пока часто он не срабатывает, может быть, дело в том, что есть ещё ошибки в вычислении стоимости планов запроса. К сожалению, пока в постгресе нет трассировки выбора плана запроса и точную причину установить я не смогу.

6) Я обращу внимание разработчиков данного расширения на найденный баг с размером записи и странным выбором планов. Так как баги они фиксят регулярно, то надежда на исправление этих проблем остаётся.

В завершение, хочу сказать разработчикам этого расширения большое спасибо за их работу и пожелать им дальнейших успехов!
Tags:
Hubs:
+21
Comments 6
Comments Comments 6

Articles

Information

Website
www.infowatch.ru
Registered
Founded
Employees
201–500 employees
Location
Россия