Add BRIN support for spoint and sbox #8

gbroccolo · 2018-01-15T10:02:16Z

As discussed in the mailing list, I opened the PR here. Support for searches within spherical circles will be added in the future.

akorotkov

As I can see, you are indenting using spaces. pgsphere follows PostgreSQL coding style guidelines. Could you please indent this code using tabs? You can try just run pg_indent over new source files.

akorotkov · 2018-01-16T09:25:01Z

Makefile

             gnomo.o

 EXTENSION   = pg_sphere
 DATA_built  = pg_sphere--1.0.sql
 DOCS        = README.pg_sphere COPYRIGHT.pg_sphere
 REGRESS     = init tables points euler circle line ellipse poly path box index \
-              contains_ops contains_ops_compat bounding_box_gist gnomo
+              contains_ops contains_ops_compat bounding_box_gist gnomo spoint_brin


Should spoint_brin test be used only on BRIN_CHECK?

akorotkov · 2018-01-16T09:28:52Z

brin.c

+}
+
+//
+//  Define operators procedures below


In PostgreSQL coding style, we evade C++ style comments.

akorotkov · 2018-01-16T09:31:15Z

In general looks very good, but please take a look to minor notes by my review.

gbroccolo · 2018-01-16T10:50:01Z

Thank you @akorotkov for the review. I'll fix the minor notes ASAP, and let you know when it will be done.

…st for PG>9.5

gbroccolo · 2018-02-27T14:18:25Z

Hi @akorotkov I've then fixed what you reported: I've used pgindent for source code, and corrected the Makefile in order to execute regression test of BRIN code just for proper PostgreSQL versions. Sorry for my late reply, it was a busy period. It should be now ready to be merged, but let me know if there are further needed fixes.

esabol · 2018-02-27T15:32:41Z

You still have a tab on line 34 of the Makefile (where you added $(BRIN_REGRESS)) that needs to be changed to spaces, I think. Compare with line 33 which is indented with only spaces. Either that or line 33 should have been indented with a tab. Not sure.

This is a fantastic contribution!

gbroccolo · 2018-02-27T19:26:45Z

Hi @esabol indeed I passed to pgindent only source code, not the Makefile. It works, but I can fix the style also for this. Thank you for your review!

gbroccolo · 2018-03-04T16:32:52Z

I've also fixed the Makefile as per @esabol suggestion: other lines where indented with spaces so I uniformly indented the added line (n. 34) with spaces too.

akorotkov · 2018-03-11T18:03:21Z

I think we also need to bump extension version, so that existing users can upgrade. Another issue is instances pg_upgrade'd from pre-9.5. PostgreSQL doesn't offer built-in mechanism which could add them a BRIN index support. But we can do that using manual SQL-script and describe its usage in README.

…in case of upgrades from pre9.5 releases

gbroccolo · 2018-03-13T00:31:10Z

Hi @akorotkov I've added the following changes:

Upgrade of the copyright to 2018
Upgrade version in control file from 1.0 to 1.1
Upgrade version in Makefile from 1.1.5 to 1.1.6
Included in utils/ a command to load BRIN support in case of upgrades from pre-9.5 releases through pg_upgrade
Add the above information both to the documentation and in the README file

esabol · 2018-05-10T20:11:20Z

It's been a while. Any progress?

gbroccolo · 2018-05-12T12:48:23Z

@esabol I've applied the fixes required by @akorotkov, adding a command to support upgrades from pre-9.5 releases through pg_upgrade (similarly to what I've done for BRINs in PostGIS) and improving the documentation too. This is ready for the final review.

In the meantime, I've also tried to perform some benchmark on an artificial sample of spherical points, varying the size of the sample itself. I'm attaching here one of the plot I've obtained: it reports the time needed to select 1 spherical point included in a particular spherical sector using both BRINs or GiST indices, as the size of the whole sample increase, keeping low values for the configured shared buffers in PostgreSQL in order to better see how BRINs are useful for big datasets (note the behaviours' change at ~10M points).

esabol · 2018-05-12T14:17:36Z

Nice work. The performance degradation for less than ~10M points is disappointing, if I am being honest.

gbroccolo · 2018-05-12T19:58:11Z

@esabol it is expected that BRINs performance is lower than a tree index. Block Range INdices map per range, than the retrieved ranges after the index scan have to be sequentially inspected to find exact matches. In this sense, BRINs are more "lossy" than GiST indices. But the cost to inspect ranges' map is as lower than the cost to inspect a GiST index as the sample size is larger. BRINs are specifically thought for big dataset, more than an alternative of GiST indices.

Furthermore, consider that the sample used to generate the plot is an artificial one, running on a small DB server. It would be good to see how the BRINs perform with a real dataset.

esabol · 2018-05-12T20:04:48Z

I guess I'd just hoped that BRINs would see a benefit at ~1M points instead of ~10M. Many of our largest catalogs are in the millions of rows, not tens of millions. Of course, that will change over time. Catalogs only keep getting bigger.

gbroccolo · 2018-05-13T14:36:40Z

@esabol Block Range INdexing is configurable, i.e. you can set the number of data pages included in each range (in the above example, I considered 16 pages per range) as lower as you are able to be close to the desired performance. This will increase the effort for the maintenance of the index (time needed for creation, final size of the index, impact during insertions, time needed for resummarization) that will be more similar to the one needed for the GiST indices, but also the performance will be more similar for small datasets, keeping the advantages for larger ones. The parameter that configure this behaviour is the pages_per_range one, tuning it you can start to see benefits also for smaller datasets than ~10M ones. Furthermore, this "breakeven" point strongly depend from the underlying hardware, the specific data content, and so on. You should check the behaviours in your specific case.

akorotkov · 2018-05-17T09:51:04Z

@gbroccolo, right. BRIN performance is strongly dependent on how keys are correlated with layout of tuples on heap pages. We can imagine two corner cases for BRIN performance:

Worst case: keys distribution is random. MBR for page is almost as wide as MBR for the whole dataset.
Heap tuples are perfectly clustered by their keys: tuples which lays near contains near located keys. This could be achieved artificially by clustering heap by GiST index.

So as I get, the experiment you did is close to the first scenario. In real-life, distribution could be different, and expected to be somewhere in the middle of these two corner cases.

I would note, that I found experiment you did quite surprising. BRIN is expected to be slower on selective queries (low fraction of tuples is selected), and faster on non-selective queries (high fraction of tuples is selected). So, for me most straight-forward benchmark would be following: for some large-enough dataset run queries with different selectivity. With fraction of selected tuples higher than some threshold, BRIN should outperform GiST.

As I get in your experiment, every query selects 1 point. And we can see serious degradation of GiST near 10^7 dataset size. I can explain it as following. Before 10^7 points, GiST index fits OS cache. After that, lookups in GiST index becomes more expensive, because of random disk seeks. While BRIN index performs better thanks to smaller size of sequential disk access pattern. In order to help me validate my thoughts, could you please provide more details on hard 8000 ware you use during benchmark: amount of RAM, type of disk (hard disk or SSD), PostgreSQL settings etc?

akorotkov · 2018-05-17T10:01:37Z

Regarding pull request itself, I've some more notes:

Upgrade from pre-9.5 versions needs ALTER EXTENSION ... ADD ... commands to make new objects to be linked to extension.
You deleted sscan.c. Keeping this file in git helps to support building on systems, which don't have flex installed. So, please don't remove it from repository, even despite make clean removes it.

ghost · 2018-05-17T12:54:27Z

Please excuse me, regrettably sscan.c had been prematurely deleted by me on the master branch. I have now reverted the deletion.

akorotkov · 2018-05-17T12:58:22Z

@mnullmei, no problem, thanks.

gbroccolo · 2018-05-21T21:51:57Z

Hi @akorotkov, let me give further details about my benchmarks: I considered an Ubuntu Xenial VM with 2GB RAM, the storage unit was based on SSD, but consider the driver virtualization. I considered PostgreSQL 9.6 + pgSphere built from this current branch, and default configuration for PostgreSQL.

The artificial dataset was inserted in the DB in order to keep spatially close objects as more as possible close in data page tuples on disk, so the benchmark was actually more similar to
the second scenario you mentioned in your reply, than the first one.

Apart of the graph I attached in my previous reply, I've produced several graphs, in order to inspect data access to disk, how performance changes with the selectivity of the query, and so on. I attach here the graph you requested, reporting the time needed to select a variable number of points included in a spherical sector, from 1 single point to 1 million. BRINs start to perform closer to GiST indices at ~O(100k) points, so performance start to increase as the selectivity is lower and lower.

For a further validation of your thoughts, I'm also attaching the number of indices & data blocks accessed from the storage, as the selectivity of the query decreases. As you mentioned, the reduced size of the index allows it to be completely cached in the database's shared buffers in memory. As a consequence no index block needs to be read from disk in case of BRIN, and also the number of data blocks read from disk is quite lower than GiST case as the selectivity of the query decreases.

Regarding the PR: let me rebase on master in order to include the fix of @mnullmei, and have a look to the ALTER EXTENSION stuff.

…and ops families to the extension

gbroccolo · 2018-05-27T15:39:18Z

@akorotkov I rebased the PR branch to the latest master, and added the ALTER EXTENSION commands to link the objects to the extension.

esabol · 2023-08-09T18:39:54Z

If anyone is interested in reviving this feature, I recommend submitting a PR to the currently active pgsphere repo: https://github.com/postgrespro/pgsphere

gbroccolo · 2023-08-14T20:56:20Z

Hi @esabol, to be fair I don't know anymore if the PR would be useful after 5 years. Maybe now catalogs with ~O(10M) start to be more common, anyhow it's a pity because the PR was ready with all the required fixes and changes came up already 5 years ago. Before starting working on this again, my question is: would the PR (considering the limitations of the Block Range INdexing) be still of interest for pgsphere?

esabol · 2023-08-14T23:02:31Z

@gbroccolo wrote:

Before starting working on this again, my question is: would the PR (considering the limitations of the Block Range INdexing) be still of interest for pgsphere?

I don't know that I can answer that for all pgsphere users, but I do agree that ~O(10M) datasets are more common these days. That said, I have opened an issue to discuss the matter in the new repo.

postgrespro/pgsphere#52

Looking at the changes in this PR, I don't think it would be that difficult to get a similar PR working with the new pgsphere repo. It's kind of up to you, I think.

esabol · 2023-08-16T19:30:48Z

Update: There's now a working PR for adding BRIN support to the https://github.com/postgrespro/pgsphere/ repo that is based on @gbroccolo's PR here with essentially just some updates to the Makefile.

postgrespro/pgsphere#55

Add BRIN support for spoint and sbox

efcbfb4

akorotkov reviewed Jan 16, 2018

View reviewed changes

Adjust style using pgindent, execute regression test for BRIN code ju…

f467dbd

…st for PG>9.5

Adjust indentation of Makefile

bedfd75

gbroccolo added 2 commits March 13, 2018 00:15

Update copyright; increase pgSphere version; add a script to be used …

0cdeabf

…in case of upgrades from pre9.5 releases

upgrade version in control file

d3209ea

gbroccolo added 3 commits May 21, 2018 23:04

Rebase on master branch of the forked repo

6753e96

Merge branch 'master' into master

d8dbfbe

Add the ALTER EXTENSION commands to link the funcs, ops, ops classes …

6014ad7

…and ops families to the extension

df7cb mentioned this pull request Aug 9, 2023

Add BRIN support for spoint and sbox pgsphere/pgsphere#1

Closed

esabol mentioned this pull request Aug 14, 2023

Add BRIN support for spoint and sbox? postgrespro/pgsphere#52

Closed

vitcpp mentioned this pull request Aug 16, 2023

Add BRIN support for spoint and sbox postgrespro/pgsphere#55

Merged

gbroccolo closed this by deleting the head repository Aug 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add BRIN support for spoint and sbox #8

Add BRIN support for spoint and sbox #8

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add BRIN support for spoint and sbox #8

Add BRIN support for spoint and sbox #8

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!