From b2435349f78dec641c236f6c87fca76b23b4e83e Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Sun, 22 Nov 2020 07:50:52 -0800 Subject: [PATCH 01/31] Correct little typos MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Make sure variable references match their declarations, clarify fields (vis-à-vis "member variables"). --- RefPtr.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/RefPtr.md b/RefPtr.md index 0bae22a..0359f4b 100644 --- a/RefPtr.md +++ b/RefPtr.md @@ -26,7 +26,7 @@ A value refers to the "content" or meaning that occupies the address of an objec ```cpp S* OwnsParam(T* t1) { - t->Release(); + t1->Release(); } ``` @@ -63,7 +63,7 @@ Or more likely it looks like this: T* RetOwner(int) { T *t = new T(...); - do_stuff(); + do_stuff(t); return t; } @@ -74,7 +74,7 @@ A caller should look like this: ```cpp T* t = RetOwner(42); -do_stuff(); +do_stuff(t); t->Release(); ``` @@ -89,7 +89,7 @@ struct S { S(T* t) : t_(t) {} ~S() { - if (t) + if (t_) t_->Release(); } } @@ -105,7 +105,7 @@ Often enough, ORCA also does a variant } ``` -We have 324 such occurrences of `SafeRelease` called on a member variable from destructors in ORCA `.cpp` files. +We have 324 such occurrences of `SafeRelease` called on a field from destructors in ORCA `.cpp` files. A caller that constructs an object of type `S` should look like this @@ -261,7 +261,7 @@ pointer foo(T *parm1, U parm2) { We have 242 occurrences of functions returning a parameter in ORCA `.cpp` files (and a lot more in headers). ## base.memOwnSafeRelease -A non-static member variable of a struct (or class) that is released in its destructor is an owner. i.e. when we match: +A non-static field (data member) of a struct (or class) that is released in its destructor is an owner. i.e. when we match: ```cpp struct S { @@ -400,9 +400,9 @@ private: T* p_ = nullptr; }; -// helper analogous to std::make_unique -template -RefPtr make_ref(Args&&... args) { return {new T(args...);} } +// helper analogous to std::allocate_shared +template +RefPtr allocate_ref(MP mp, Args&&... args) { return {GPOS_NEW(mp) T(args...);} } ``` ## Circular dependency? @@ -441,7 +441,7 @@ Here the caller is forced to take ownership, otherwise it leaks. Compare to the ```cpp RefPtr Func1() { - auto t = gpos::make_ref(arg1, arg2); + auto t = gpos::allocate_ref(arg1, arg2); return t; } ``` From 9cbe5a2127ae734c47caa4b4b110ec4a6433f45d Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Mon, 8 Oct 2018 13:26:42 -0700 Subject: [PATCH 02/31] local Ruby set up for Github Pages --- .gitignore | 3 + .ruby-version | 1 + Gemfile | 2 + Gemfile.lock | 248 ++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 254 insertions(+) create mode 100644 .gitignore create mode 100644 .ruby-version create mode 100644 Gemfile create mode 100644 Gemfile.lock diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..45c1505 --- /dev/null +++ b/.gitignore @@ -0,0 +1,3 @@ +_site +.sass-cache +.jekyll-metadata diff --git a/.ruby-version b/.ruby-version new file mode 100644 index 0000000..73462a5 --- /dev/null +++ b/.ruby-version @@ -0,0 +1 @@ +2.5.1 diff --git a/Gemfile b/Gemfile new file mode 100644 index 0000000..37f5eaa --- /dev/null +++ b/Gemfile @@ -0,0 +1,2 @@ +source 'https://rubygems.org' +gem 'github-pages', group: :jekyll_plugins diff --git a/Gemfile.lock b/Gemfile.lock new file mode 100644 index 0000000..ba85901 --- /dev/null +++ b/Gemfile.lock @@ -0,0 +1,248 @@ +GEM + remote: https://rubygems.org/ + specs: + activesupport (4.2.10) + i18n (~> 0.7) + minitest (~> 5.1) + thread_safe (~> 0.3, >= 0.3.4) + tzinfo (~> 1.1) + addressable (2.5.2) + public_suffix (>= 2.0.2, < 4.0) + coffee-script (2.4.1) + coffee-script-source + execjs + coffee-script-source (1.11.1) + colorator (1.1.0) + commonmarker (0.17.13) + ruby-enum (~> 0.5) + concurrent-ruby (1.0.5) + dnsruby (1.61.2) + addressable (~> 2.5) + em-websocket (0.5.1) + eventmachine (>= 0.12.9) + http_parser.rb (~> 0.6.0) + ethon (0.11.0) + ffi (>= 1.3.0) + eventmachine (1.2.7) + execjs (2.7.0) + faraday (0.15.3) + multipart-post (>= 1.2, < 3) + ffi (1.9.25) + forwardable-extended (2.6.0) + gemoji (3.0.0) + github-pages (192) + activesupport (= 4.2.10) + github-pages-health-check (= 1.8.1) + jekyll (= 3.7.4) + jekyll-avatar (= 0.6.0) + jekyll-coffeescript (= 1.1.1) + jekyll-commonmark-ghpages (= 0.1.5) + jekyll-default-layout (= 0.1.4) + jekyll-feed (= 0.10.0) + jekyll-gist (= 1.5.0) + jekyll-github-metadata (= 2.9.4) + jekyll-mentions (= 1.4.1) + jekyll-optional-front-matter (= 0.3.0) + jekyll-paginate (= 1.1.0) + jekyll-readme-index (= 0.2.0) + jekyll-redirect-from (= 0.14.0) + jekyll-relative-links (= 0.5.3) + jekyll-remote-theme (= 0.3.1) + jekyll-sass-converter (= 1.5.2) + jekyll-seo-tag (= 2.5.0) + jekyll-sitemap (= 1.2.0) + jekyll-swiss (= 0.4.0) + jekyll-theme-architect (= 0.1.1) + jekyll-theme-cayman (= 0.1.1) + jekyll-theme-dinky (= 0.1.1) + jekyll-theme-hacker (= 0.1.1) + jekyll-theme-leap-day (= 0.1.1) + jekyll-theme-merlot (= 0.1.1) + jekyll-theme-midnight (= 0.1.1) + jekyll-theme-minimal (= 0.1.1) + jekyll-theme-modernist (= 0.1.1) + jekyll-theme-primer (= 0.5.3) + jekyll-theme-slate (= 0.1.1) + jekyll-theme-tactile (= 0.1.1) + jekyll-theme-time-machine (= 0.1.1) + jekyll-titles-from-headings (= 0.5.1) + jemoji (= 0.10.1) + kramdown (= 1.17.0) + liquid (= 4.0.0) + listen (= 3.1.5) + mercenary (~> 0.3) + minima (= 2.5.0) + nokogiri (>= 1.8.2, < 2.0) + rouge (= 2.2.1) + terminal-table (~> 1.4) + github-pages-health-check (1.8.1) + addressable (~> 2.3) + dnsruby (~> 1.60) + octokit (~> 4.0) + public_suffix (~> 2.0) + typhoeus (~> 1.3) + html-pipeline (2.8.4) + activesupport (>= 2) + nokogiri (>= 1.4) + http_parser.rb (0.6.0) + i18n (0.9.5) + concurrent-ruby (~> 1.0) + jekyll (3.7.4) + addressable (~> 2.4) + colorator (~> 1.0) + em-websocket (~> 0.5) + i18n (~> 0.7) + jekyll-sass-converter (~> 1.0) + jekyll-watch (~> 2.0) + kramdown (~> 1.14) + liquid (~> 4.0) + mercenary (~> 0.3.3) + pathutil (~> 0.9) + rouge (>= 1.7, < 4) + safe_yaml (~> 1.0) + jekyll-avatar (0.6.0) + jekyll (~> 3.0) + jekyll-coffeescript (1.1.1) + coffee-script (~> 2.2) + coffee-script-source (~> 1.11.1) + jekyll-commonmark (1.2.0) + commonmarker (~> 0.14) + jekyll (>= 3.0, < 4.0) + jekyll-commonmark-ghpages (0.1.5) + commonmarker (~> 0.17.6) + jekyll-commonmark (~> 1) + rouge (~> 2) + jekyll-default-layout (0.1.4) + jekyll (~> 3.0) + jekyll-feed (0.10.0) + jekyll (~> 3.3) + jekyll-gist (1.5.0) + octokit (~> 4.2) + jekyll-github-metadata (2.9.4) + jekyll (~> 3.1) + octokit (~> 4.0, != 4.4.0) + jekyll-mentions (1.4.1) + html-pipeline (~> 2.3) + jekyll (~> 3.0) + jekyll-optional-front-matter (0.3.0) + jekyll (~> 3.0) + jekyll-paginate (1.1.0) + jekyll-readme-index (0.2.0) + jekyll (~> 3.0) + jekyll-redirect-from (0.14.0) + jekyll (~> 3.3) + jekyll-relative-links (0.5.3) + jekyll (~> 3.3) + jekyll-remote-theme (0.3.1) + jekyll (~> 3.5) + rubyzip (>= 1.2.1, < 3.0) + jekyll-sass-converter (1.5.2) + sass (~> 3.4) + jekyll-seo-tag (2.5.0) + jekyll (~> 3.3) + jekyll-sitemap (1.2.0) + jekyll (~> 3.3) + jekyll-swiss (0.4.0) + jekyll-theme-architect (0.1.1) + jekyll (~> 3.5) + jekyll-seo-tag (~> 2.0) + jekyll-theme-cayman (0.1.1) + jekyll (~> 3.5) + jekyll-seo-tag (~> 2.0) + jekyll-theme-dinky (0.1.1) + jekyll (~> 3.5) + jekyll-seo-tag (~> 2.0) + jekyll-theme-hacker (0.1.1) + jekyll (~> 3.5) + jekyll-seo-tag (~> 2.0) + jekyll-theme-leap-day (0.1.1) + jekyll (~> 3.5) + jekyll-seo-tag (~> 2.0) + jekyll-theme-merlot (0.1.1) + jekyll (~> 3.5) + jekyll-seo-tag (~> 2.0) + jekyll-theme-midnight (0.1.1) + jekyll (~> 3.5) + jekyll-seo-tag (~> 2.0) + jekyll-theme-minimal (0.1.1) + jekyll (~> 3.5) + jekyll-seo-tag (~> 2.0) + jekyll-theme-modernist (0.1.1) + jekyll (~> 3.5) + jekyll-seo-tag (~> 2.0) + jekyll-theme-primer (0.5.3) + jekyll (~> 3.5) + jekyll-github-metadata (~> 2.9) + jekyll-seo-tag (~> 2.0) + jekyll-theme-slate (0.1.1) + jekyll (~> 3.5) + jekyll-seo-tag (~> 2.0) + jekyll-theme-tactile (0.1.1) + jekyll (~> 3.5) + jekyll-seo-tag (~> 2.0) + jekyll-theme-time-machine (0.1.1) + jekyll (~> 3.5) + jekyll-seo-tag (~> 2.0) + jekyll-titles-from-headings (0.5.1) + jekyll (~> 3.3) + jekyll-watch (2.0.0) + listen (~> 3.0) + jemoji (0.10.1) + gemoji (~> 3.0) + html-pipeline (~> 2.2) + jekyll (~> 3.0) + kramdown (1.17.0) + liquid (4.0.0) + listen (3.1.5) + rb-fsevent (~> 0.9, >= 0.9.4) + rb-inotify (~> 0.9, >= 0.9.7) + ruby_dep (~> 1.2) + mercenary (0.3.6) + mini_portile2 (2.3.0) + minima (2.5.0) + jekyll (~> 3.5) + jekyll-feed (~> 0.9) + jekyll-seo-tag (~> 2.1) + minitest (5.11.3) + multipart-post (2.0.0) + nokogiri (1.8.5) + mini_portile2 (~> 2.3.0) + octokit (4.12.0) + sawyer (~> 0.8.0, >= 0.5.3) + pathutil (0.16.1) + forwardable-extended (~> 2.6) + public_suffix (2.0.5) + rb-fsevent (0.10.3) + rb-inotify (0.9.10) + ffi (>= 0.5.0, < 2) + rouge (2.2.1) + ruby-enum (0.7.2) + i18n + ruby_dep (1.5.0) + rubyzip (1.2.2) + safe_yaml (1.0.4) + sass (3.6.0) + sass-listen (~> 4.0.0) + sass-listen (4.0.0) + rb-fsevent (~> 0.9, >= 0.9.4) + rb-inotify (~> 0.9, >= 0.9.7) + sawyer (0.8.1) + addressable (>= 2.3.5, < 2.6) + faraday (~> 0.8, < 1.0) + terminal-table (1.8.0) + unicode-display_width (~> 1.1, >= 1.1.1) + thread_safe (0.3.6) + typhoeus (1.3.0) + ethon (>= 0.9.0) + tzinfo (1.2.5) + thread_safe (~> 0.1) + unicode-display_width (1.4.0) + +PLATFORMS + ruby + +DEPENDENCIES + github-pages + +BUNDLED WITH + 1.16.6 From f32224e45e0c01e9c72a833ab85eba9a095dabdd Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Mon, 8 Oct 2018 13:50:28 -0700 Subject: [PATCH 03/31] Partition table notes --- .gitignore | 4 + README.md | 3 + _config.yml | 2 + greenplum-partitions/README.md | 8 ++ .../alter-table-only-add-constraint.md | 79 ++++++++++++ .../disallow-named-constraint.md | 112 ++++++++++++++++++ 6 files changed, 208 insertions(+) create mode 100644 README.md create mode 100644 _config.yml create mode 100644 greenplum-partitions/README.md create mode 100644 greenplum-partitions/alter-table-only-add-constraint.md create mode 100644 greenplum-partitions/disallow-named-constraint.md diff --git a/.gitignore b/.gitignore index 45c1505..a19316f 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,7 @@ +.*.sw[mnop] + +/.idea + _site .sass-cache .jekyll-metadata diff --git a/README.md b/README.md new file mode 100644 index 0000000..ab76600 --- /dev/null +++ b/README.md @@ -0,0 +1,3 @@ +## Notes + +[collection of partition related thoughts](greenplum-partitions) diff --git a/_config.yml b/_config.yml new file mode 100644 index 0000000..ba594be --- /dev/null +++ b/_config.yml @@ -0,0 +1,2 @@ +exclude: + - .idea diff --git a/greenplum-partitions/README.md b/greenplum-partitions/README.md new file mode 100644 index 0000000..253eb08 --- /dev/null +++ b/greenplum-partitions/README.md @@ -0,0 +1,8 @@ +## Features +1. Treating the entire hierarchy of tables in a partition as one + 1. `CREATE TABLE` creates the whole hierarchy + 1. `DROP TABLE` drops every table in the hierarchy + +## Links +- [`ALTER TABLE ONLY pt ADD CONSTRAINT` proposal](alter-table-only-add-constraint.md) +- ["disallow named constraint" proposal](disallow-named-constraint.md) diff --git a/greenplum-partitions/alter-table-only-add-constraint.md b/greenplum-partitions/alter-table-only-add-constraint.md new file mode 100644 index 0000000..c1f0dc1 --- /dev/null +++ b/greenplum-partitions/alter-table-only-add-constraint.md @@ -0,0 +1,79 @@ +### `ALTER TABLE ONLY ... ADD CONSTRAINT ...` proposal +`ALTER TABLE ONLY ... ADD CONSTRAINT ...` should not recurse into partition children + +## Context +**GIVEN** we disallow named index-backed (think `UNIQUE`) constraints on partition table DDL + +**IN ORDER TO** provide precise control to `pg_dump` on a partition table created on Greenplum 6 with constraints + +we need to allow it in a limited form (temporarily relaxing the uniformity of `UNIQUE` constraints) during binary upgrade + +## Example + +1. On a Greenplum 6 cluster (after story ), the following DDL + + ```sql + CREATE TABLE pt ( + a integer, + b integer, + UNIQUE (a, b) + ) DISTRIBUTED BY (a) PARTITION BY RANGE(b) + ( + PARTITION alpha END (3), + PARTITION beta START (3) + ); + + SELECT conrelid::regclass, conname, conindid::regclass + FROM pg_constraint + WHERE + contype <> 'c' + AND + conrelid IN ( + SELECT 'pt'::regclass + UNION ALL + SELECT inhrelid + FROM pg_inherits + WHERE inhparent = 'pt'::regclass + ); + + ``` + + Can put the database in a unpredictable state. One possible end state [^footnote_1] is: + + ``` + conrelid | conname | conindid + ----------------+--------------------------+------------------------- + pt | pt_a_b_key1 | pt_a_b_key1 + pt_1_prt_alpha | pt_1_prt_alpha_a_b_key1 | pt_1_prt_alpha_a_b_key1 + pt_1_prt_beta | pt_1_prt_beta_a_b_key2 | pt_1_prt_beta_a_b_key2 + (3 rows) + ``` + +1. `pg_upgrade` currently dumps the following DDL for table `pt`: + + ```sql + CREATE TABLE pt (a integer, b integer) DISTRIBUTED BY (a) + PARTITION BY RANGE(b) (PARTITION alpha END (3), PARTITION beta START (3)); + + ALTER TABLE ONLY pt + ADD CONSTRAINT pt_a_b_key1 UNIQUE (a, b); + + ``` + + **FIXME** elaborate why this DDL is bad. + +## Footnote +1. [^footnote_1] pre-condition: + + ```sql + + -- given that we squat some names + CREATE TYPE pt_a_b_key AS (yolo int); + CREATE TYPE pt_1_prt_alpha_a_b_key AS (yolo int); + CREATE TYPE pt_1_prt_beta_a_b_key AS (yolo int); + CREATE TYPE pt_1_prt_beta_a_b_key1 AS (yolo int); + ``` + + +## FIXME +JZ's self-doubt: What should happen to regular `pg_dump`? Won't a regular restore become a problem too? diff --git a/greenplum-partitions/disallow-named-constraint.md b/greenplum-partitions/disallow-named-constraint.md new file mode 100644 index 0000000..3f1632c --- /dev/null +++ b/greenplum-partitions/disallow-named-constraint.md @@ -0,0 +1,112 @@ +## Disallow named constraints when creating / altering partition tables + +## DON'T START THIS STORY UNTIL WE AGREE ON +Jacob needs to fist fight Jesse on Monday (Oct 9, 2018) +But seriously, we need to be on the same page about: +1. (Jacob) What's wrong with named constraint? Can't we make it deterministic? +1. (Jesse) No that's gonna make `pg_dump` on unnamed constraints wrong + +## Context: Desired end state for index-backed constraints on partition tables +* `EXCLUDE`, `PRIMARY KEY` and `UNIQUE`constraints are named identically as their supporting indices +* **TODO** What should partition tables do? +* This should enable a simpler `pg_upgrade`: + 1. `pg_upgrade` should refuse to upgrade a cluster that has `UNIQUE` or `PRIMARY KEY` constraints (also `EXCLUDE` constraints if we don't fix 6.0...) that are differently named from the indices (probably file a story on figuring out how to detect that in 4.3 and 5.X) + 1. `pg_upgrade` can assume that the old cluster is "good" (constraint names = index names) + 1. (optional) `pg_upgrade` can generate a pair of SQL scripts: + - "drop constraints" SQL to suggest turning a non-conforming database to a "good" one + - a companion "recreate constraints" SQL script that puts the integrity constraints back (not guaranteed to have the same names though) + + +## Actual story +Given our vision on eventually identically naming index-backed constraints with their indices, we need to disallow: + +1. name colulmn constraints: + - `CREATE TABLE pt (a int, b int CONSTRAINT yolo UNIQUE) PARTITION BY range(b) (END (10));` +1. table constraint at creation: + - `CREATE TABLE pt (a int, b int, CONSTRAINT yolo UNIQUE(a,b)) DISTRIBUTED BY (a) PARTITION BY range(b) (END (10));` +1. table cosntraint post-creation: + - `ALTER TABLE pt ADD CONSTRAINT yolo UNIQUE (a,b);` + + +## Example to convince Jacob +Consider the following sequence of operation on Greenplum 5/6 + +```sql +-- given that we squat some names +CREATE TYPE pt_a_b_key AS (yolo int); +CREATE TYPE pt_1_prt_alpha_a_b_key AS (yolo int); +CREATE TYPE pt_1_prt_beta_a_b_key AS (yolo int); +CREATE TYPE pt_1_prt_beta_a_b_key1 AS (yolo int); + +-- and + +CREATE TABLE pt ( + a integer, + b integer, + UNIQUE (a, b) +) DISTRIBUTED BY (a) PARTITION BY RANGE(b) + ( + PARTITION alpha END (3), + PARTITION beta START (3) + ); + + +CREATE FUNCTION constraints_and_indices_for(IN t regclass, OUT relation regclass, OUT "constraint" name, OUT index regclass) RETURNS SETOF RECORD +LANGUAGE plpgsql STABLE STRICT AS +$fn$ + +BEGIN + + RETURN QUERY + SELECT con.conrelid::regclass, con.conname, ind.indexrelid::regclass + FROM pg_constraint con + INNER JOIN pg_depend dep ON + dep.classid = 'pg_class'::regclass + AND + dep.refclassid = 'pg_constraint'::regclass + AND + dep.refobjid = con.oid + AND + dep.deptype = 'i' + AND + con.contype <> 'c' + AND + con.conrelid IN ( + SELECT 'pt'::regclass + UNION ALL + SELECT inhrelid + FROM pg_inherits + WHERE inhparent = t + ) + INNER JOIN pg_index ind ON + dep.objid = indexrelid + AND + con.conrelid = indrelid + ; +END +$fn$; + +SELECT * FROM constraints_and_indices_for('pt'); + +-- erase my tracks +DROP TYPE pt_a_b_key; +DROP TYPE pt_1_prt_alpha_a_b_key; +DROP TYPE pt_1_prt_beta_a_b_key; +DROP TYPE pt_1_prt_beta_a_b_key1; +``` + +Depending on the database state before the `CREATE TABLE`, the index names are not predictable: + +``` + conrelid | conname | conindid +----------------+---------+------------------------- + pt | pt_key | pt_a_b_key1 + pt_1_prt_alpha | pt_key | pt_1_prt_alpha_a_b_key1 + pt_1_prt_beta | pt_key | pt_1_prt_beta_a_b_key2 +(3 rows) + +``` + +## History +Spun out of an uber story + From fcaed9c5ce2ca885c6f731a20de8be903349ff9b Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Thu, 11 Oct 2018 12:29:24 -0700 Subject: [PATCH 04/31] What's wrong with named constraints --- .../alter-table-only-add-constraint.md | 65 +++++++++++++------ .../disallow-named-constraint.md | 26 ++++---- .../why-not-named-constraints.md | 47 ++++++++++++++ 3 files changed, 106 insertions(+), 32 deletions(-) create mode 100644 greenplum-partitions/why-not-named-constraints.md diff --git a/greenplum-partitions/alter-table-only-add-constraint.md b/greenplum-partitions/alter-table-only-add-constraint.md index c1f0dc1..8d95e3d 100644 --- a/greenplum-partitions/alter-table-only-add-constraint.md +++ b/greenplum-partitions/alter-table-only-add-constraint.md @@ -2,11 +2,11 @@ `ALTER TABLE ONLY ... ADD CONSTRAINT ...` should not recurse into partition children ## Context -**GIVEN** we disallow named index-backed (think `UNIQUE`) constraints on partition table DDL +**GIVEN** we [disallow](disallow-named-constraint.md) named index-backed (think `UNIQUE`) constraints on partition table DDL **IN ORDER TO** provide precise control to `pg_dump` on a partition table created on Greenplum 6 with constraints -we need to allow it in a limited form (temporarily relaxing the uniformity of `UNIQUE` constraints) during binary upgrade +We need to allow it in a limited form (temporarily relaxing the uniformity of `UNIQUE` constraints) ## Example @@ -23,6 +23,12 @@ we need to allow it in a limited form (temporarily relaxing the uniformity of `U PARTITION beta START (3) ); + ``` + + Can put the database in a unpredictable state. One possible end state [^wacky_ddl] is: + + ```sql + SELECT conrelid::regclass, conname, conindid::regclass FROM pg_constraint WHERE @@ -36,16 +42,11 @@ we need to allow it in a limited form (temporarily relaxing the uniformity of `U WHERE inhparent = 'pt'::regclass ); - ``` - - Can put the database in a unpredictable state. One possible end state [^footnote_1] is: - - ``` conrelid | conname | conindid ----------------+--------------------------+------------------------- pt | pt_a_b_key1 | pt_a_b_key1 - pt_1_prt_alpha | pt_1_prt_alpha_a_b_key1 | pt_1_prt_alpha_a_b_key1 - pt_1_prt_beta | pt_1_prt_beta_a_b_key2 | pt_1_prt_beta_a_b_key2 + pt_1_prt_alpha | pt_1_prt_alpha_a_b_key2 | pt_1_prt_alpha_a_b_key2 + pt_1_prt_beta | pt_1_prt_beta_a_b_key3 | pt_1_prt_beta_a_b_key3 (3 rows) ``` @@ -55,25 +56,47 @@ we need to allow it in a limited form (temporarily relaxing the uniformity of `U CREATE TABLE pt (a integer, b integer) DISTRIBUTED BY (a) PARTITION BY RANGE(b) (PARTITION alpha END (3), PARTITION beta START (3)); - ALTER TABLE ONLY pt + ALTER TABLE pt ADD CONSTRAINT pt_a_b_key1 UNIQUE (a, b); ``` - - **FIXME** elaborate why this DDL is bad. -## Footnote -1. [^footnote_1] pre-condition: +1. Ideally, you want to dump the above catalog state as: ```sql + CREATE TABLE pt (a integer, b integer) DISTRIBUTED BY (a) + PARTITION BY RANGE(b) (PARTITION alpha END (3), PARTITION beta START (3)); - -- given that we squat some names - CREATE TYPE pt_a_b_key AS (yolo int); - CREATE TYPE pt_1_prt_alpha_a_b_key AS (yolo int); - CREATE TYPE pt_1_prt_beta_a_b_key AS (yolo int); - CREATE TYPE pt_1_prt_beta_a_b_key1 AS (yolo int); + ALTER TABLE ONLY pt_1_prt_alpha ADD CONSTRAINT pt_1_prt_alpha_a_b_key1; + ALTER TABLE ONLY pt_1_prt_beta ADD CONSTRAINT pt_1_prt_beta_a_b_key2; + ALTER TABLE ONLY pt ADD CONSTRAINT pt_a_b_key1; ``` -## FIXME -JZ's self-doubt: What should happen to regular `pg_dump`? Won't a regular restore become a problem too? +[^wacky_ddl]: How to get the database to the wacky state: + + ```sql + -- squat some table names + CREATE TYPE pt_a_b_key AS (yolo int); + CREATE TYPE pt_1_prt_alpha_a_b_key AS (yolo int); + CREATE TYPE pt_1_prt_alpha_a_b_key1 AS (yolo int); + CREATE TYPE pt_1_prt_beta_a_b_key AS (yolo int); + CREATE TYPE pt_1_prt_beta_a_b_key1 AS (yolo int); + CREATE TYPE pt_1_prt_beta_a_b_key2 AS (yolo int); + + -- create table using unnamed constraint syntax + CREATE TABLE pt ( + a integer, + b integer, + UNIQUE (a, b) + ) DISTRIBUTED BY (a) PARTITION BY RANGE(b) + (PARTITION alpha END (3), PARTITION beta START (3)); + + -- erase my tracks + DROP TYPE pt_a_b_key; + DROP TYPE pt_1_prt_alpha_a_b_key; + DROP TYPE pt_1_prt_alpha_a_b_key1; + DROP TYPE pt_1_prt_beta_a_b_key; + DROP TYPE pt_1_prt_beta_a_b_key1; + DROP TYPE pt_1_prt_beta_a_b_key2; + ``` diff --git a/greenplum-partitions/disallow-named-constraint.md b/greenplum-partitions/disallow-named-constraint.md index 3f1632c..5c80f75 100644 --- a/greenplum-partitions/disallow-named-constraint.md +++ b/greenplum-partitions/disallow-named-constraint.md @@ -3,7 +3,7 @@ ## DON'T START THIS STORY UNTIL WE AGREE ON Jacob needs to fist fight Jesse on Monday (Oct 9, 2018) But seriously, we need to be on the same page about: -1. (Jacob) What's wrong with named constraint? Can't we make it deterministic? +1. (Jacob) What's wrong with named constraint? Can't we make it deterministic? [(Jesse) No.](why-not-named-constraints.md) 1. (Jesse) No that's gonna make `pg_dump` on unnamed constraints wrong ## Context: Desired end state for index-backed constraints on partition tables @@ -18,13 +18,13 @@ But seriously, we need to be on the same page about: ## Actual story -Given our vision on eventually identically naming index-backed constraints with their indices, we need to disallow: +Given our vision on eventually identically naming index-backed constraints with their indices, we need to disallow the following on partition tables: -1. name colulmn constraints: +1. named colulmn constraints: - `CREATE TABLE pt (a int, b int CONSTRAINT yolo UNIQUE) PARTITION BY range(b) (END (10));` -1. table constraint at creation: +1. named table constraint at table creation: - `CREATE TABLE pt (a int, b int, CONSTRAINT yolo UNIQUE(a,b)) DISTRIBUTED BY (a) PARTITION BY range(b) (END (10));` -1. table cosntraint post-creation: +1. adding named table cosntraint: - `ALTER TABLE pt ADD CONSTRAINT yolo UNIQUE (a,b);` @@ -72,7 +72,7 @@ BEGIN con.contype <> 'c' AND con.conrelid IN ( - SELECT 'pt'::regclass + SELECT t UNION ALL SELECT inhrelid FROM pg_inherits @@ -98,15 +98,19 @@ DROP TYPE pt_1_prt_beta_a_b_key1; Depending on the database state before the `CREATE TABLE`, the index names are not predictable: ``` - conrelid | conname | conindid -----------------+---------+------------------------- - pt | pt_key | pt_a_b_key1 - pt_1_prt_alpha | pt_key | pt_1_prt_alpha_a_b_key1 - pt_1_prt_beta | pt_key | pt_1_prt_beta_a_b_key2 + relation | constraint | index +----------------+------------+------------------------- + pt | pt_key | pt_a_b_key1 + pt_1_prt_alpha | pt_key | pt_1_prt_alpha_a_b_key1 + pt_1_prt_beta | pt_key | pt_1_prt_beta_a_b_key2 (3 rows) ``` +## FAQ +1. [Why are you against named (index-backed) constraints on a partition table?](why-not-named-constraints.md) + + ## History Spun out of an uber story diff --git a/greenplum-partitions/why-not-named-constraints.md b/greenplum-partitions/why-not-named-constraints.md new file mode 100644 index 0000000..7128fd4 --- /dev/null +++ b/greenplum-partitions/why-not-named-constraints.md @@ -0,0 +1,47 @@ +## What's wrong with named constraints +What's wrong with named (index-backed) constraint on partition tables like the following: + +```sql +CREATE TABLE pt ( + a integer, + b integer, + CONSTRAINT yolo UNIQUE (a, b) +) DISTRIBUTED BY (a) PARTITION BY RANGE(b) + ( + PARTITION alpha END (3), + PARTITION beta START (3) + ); +``` + +TL;DR once an object (index) is created, theirs name _can_ change. Which means that the best we could do is a _syntactic sugar_. **Named constraints can *never* be a desugared syntax.** + +## Can't we just have a consistent naming scheme? +No we can't. Let's assume you propose that for the DDL above, you want to combine the table / partition name with the constraint name: + +``` + conrelid | conname | indexrelid +----------------+---------------------+--------------------- + pt | pt_yolo | pt_yolo + pt_1_prt_alpha | pt_1_prt_alpha_yolo | pt_1_prt_alpha_yolo + pt_1_prt_beta | pt_1_prt_beta_yolo | pt_1_prt_beta_yolo +(3 rows) + +``` + +To enable named constraints as a *desugared* syntax, we need to be able to map a catalog state of the above to a single DDL (think: `pg_dump`). +But that's impossible: + +1. It's very tempting to infer the `CONSTRAINT yolo` by looking at the table name (`pt`) and the index name (`pt_yolo`). But... +1. What if an index on a leaf table gets renamed (example follows)? You easily get an inference conflict. + + ``` + conrelid | conname | indexrelid + ----------------+---------------------+--------------------- + pt | pt_yolo | pt_yolo + pt_1_prt_alpha | pt_1_prt_charlie | pt_1_prt_charlie + pt_1_prt_beta | pt_1_prt_beta_yo | pt_1_prt_beta_yo + (3 rows) + + ``` +1. Should we rename indexes as a partiton joins the hierarchy (think: `EXCHANGE PARTITION`)? +1. There's also a usability issue here: letting the user specify `CONSTRAINT yolo`, and then creating objects not really named `yolo` (and we cannot reasonably refer to in the future using the name `yolo`) is a surprise. From c4eb138032c0f7c9a683c7897fea027e770c56c0 Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Fri, 22 May 2020 09:20:09 -0700 Subject: [PATCH 05/31] bundle update --- Gemfile.lock | 212 +++++++++++++++++++++++++-------------------------- 1 file changed, 106 insertions(+), 106 deletions(-) diff --git a/Gemfile.lock b/Gemfile.lock index ba85901..a1f36f9 100644 --- a/Gemfile.lock +++ b/Gemfile.lock @@ -1,13 +1,14 @@ GEM remote: https://rubygems.org/ specs: - activesupport (4.2.10) - i18n (~> 0.7) + activesupport (6.0.3.1) + concurrent-ruby (~> 1.0, >= 1.0.2) + i18n (>= 0.7, < 2) minitest (~> 5.1) - thread_safe (~> 0.3, >= 0.3.4) tzinfo (~> 1.1) - addressable (2.5.2) - public_suffix (>= 2.0.2, < 4.0) + zeitwerk (~> 2.2, >= 2.2.2) + addressable (2.7.0) + public_suffix (>= 2.0.2, < 5.0) coffee-script (2.4.1) coffee-script-source execjs @@ -15,43 +16,42 @@ GEM colorator (1.1.0) commonmarker (0.17.13) ruby-enum (~> 0.5) - concurrent-ruby (1.0.5) - dnsruby (1.61.2) + concurrent-ruby (1.1.6) + dnsruby (1.61.3) addressable (~> 2.5) em-websocket (0.5.1) eventmachine (>= 0.12.9) http_parser.rb (~> 0.6.0) - ethon (0.11.0) + ethon (0.12.0) ffi (>= 1.3.0) eventmachine (1.2.7) execjs (2.7.0) - faraday (0.15.3) + faraday (1.0.1) multipart-post (>= 1.2, < 3) - ffi (1.9.25) + ffi (1.12.2) forwardable-extended (2.6.0) - gemoji (3.0.0) - github-pages (192) - activesupport (= 4.2.10) - github-pages-health-check (= 1.8.1) - jekyll (= 3.7.4) - jekyll-avatar (= 0.6.0) + gemoji (3.0.1) + github-pages (204) + github-pages-health-check (= 1.16.1) + jekyll (= 3.8.5) + jekyll-avatar (= 0.7.0) jekyll-coffeescript (= 1.1.1) - jekyll-commonmark-ghpages (= 0.1.5) + jekyll-commonmark-ghpages (= 0.1.6) jekyll-default-layout (= 0.1.4) - jekyll-feed (= 0.10.0) + jekyll-feed (= 0.13.0) jekyll-gist (= 1.5.0) - jekyll-github-metadata (= 2.9.4) - jekyll-mentions (= 1.4.1) - jekyll-optional-front-matter (= 0.3.0) + jekyll-github-metadata (= 2.13.0) + jekyll-mentions (= 1.5.1) + jekyll-optional-front-matter (= 0.3.2) jekyll-paginate (= 1.1.0) - jekyll-readme-index (= 0.2.0) - jekyll-redirect-from (= 0.14.0) - jekyll-relative-links (= 0.5.3) - jekyll-remote-theme (= 0.3.1) + jekyll-readme-index (= 0.3.0) + jekyll-redirect-from (= 0.15.0) + jekyll-relative-links (= 0.6.1) + jekyll-remote-theme (= 0.4.1) jekyll-sass-converter (= 1.5.2) - jekyll-seo-tag (= 2.5.0) - jekyll-sitemap (= 1.2.0) - jekyll-swiss (= 0.4.0) + jekyll-seo-tag (= 2.6.1) + jekyll-sitemap (= 1.4.0) + jekyll-swiss (= 1.0.0) jekyll-theme-architect (= 0.1.1) jekyll-theme-cayman (= 0.1.1) jekyll-theme-dinky (= 0.1.1) @@ -61,33 +61,32 @@ GEM jekyll-theme-midnight (= 0.1.1) jekyll-theme-minimal (= 0.1.1) jekyll-theme-modernist (= 0.1.1) - jekyll-theme-primer (= 0.5.3) + jekyll-theme-primer (= 0.5.4) jekyll-theme-slate (= 0.1.1) jekyll-theme-tactile (= 0.1.1) jekyll-theme-time-machine (= 0.1.1) - jekyll-titles-from-headings (= 0.5.1) - jemoji (= 0.10.1) + jekyll-titles-from-headings (= 0.5.3) + jemoji (= 0.11.1) kramdown (= 1.17.0) - liquid (= 4.0.0) - listen (= 3.1.5) + liquid (= 4.0.3) mercenary (~> 0.3) - minima (= 2.5.0) - nokogiri (>= 1.8.2, < 2.0) - rouge (= 2.2.1) + minima (= 2.5.1) + nokogiri (>= 1.10.4, < 2.0) + rouge (= 3.13.0) terminal-table (~> 1.4) - github-pages-health-check (1.8.1) + github-pages-health-check (1.16.1) addressable (~> 2.3) dnsruby (~> 1.60) octokit (~> 4.0) - public_suffix (~> 2.0) + public_suffix (~> 3.0) typhoeus (~> 1.3) - html-pipeline (2.8.4) + html-pipeline (2.12.3) activesupport (>= 2) nokogiri (>= 1.4) http_parser.rb (0.6.0) i18n (0.9.5) concurrent-ruby (~> 1.0) - jekyll (3.7.4) + jekyll (3.8.5) addressable (~> 2.4) colorator (~> 1.0) em-websocket (~> 0.5) @@ -100,49 +99,50 @@ GEM pathutil (~> 0.9) rouge (>= 1.7, < 4) safe_yaml (~> 1.0) - jekyll-avatar (0.6.0) - jekyll (~> 3.0) + jekyll-avatar (0.7.0) + jekyll (>= 3.0, < 5.0) jekyll-coffeescript (1.1.1) coffee-script (~> 2.2) coffee-script-source (~> 1.11.1) - jekyll-commonmark (1.2.0) + jekyll-commonmark (1.3.1) commonmarker (~> 0.14) - jekyll (>= 3.0, < 4.0) - jekyll-commonmark-ghpages (0.1.5) + jekyll (>= 3.7, < 5.0) + jekyll-commonmark-ghpages (0.1.6) commonmarker (~> 0.17.6) - jekyll-commonmark (~> 1) - rouge (~> 2) + jekyll-commonmark (~> 1.2) + rouge (>= 2.0, < 4.0) jekyll-default-layout (0.1.4) jekyll (~> 3.0) - jekyll-feed (0.10.0) - jekyll (~> 3.3) + jekyll-feed (0.13.0) + jekyll (>= 3.7, < 5.0) jekyll-gist (1.5.0) octokit (~> 4.2) - jekyll-github-metadata (2.9.4) - jekyll (~> 3.1) + jekyll-github-metadata (2.13.0) + jekyll (>= 3.4, < 5.0) octokit (~> 4.0, != 4.4.0) - jekyll-mentions (1.4.1) + jekyll-mentions (1.5.1) html-pipeline (~> 2.3) - jekyll (~> 3.0) - jekyll-optional-front-matter (0.3.0) - jekyll (~> 3.0) + jekyll (>= 3.7, < 5.0) + jekyll-optional-front-matter (0.3.2) + jekyll (>= 3.0, < 5.0) jekyll-paginate (1.1.0) - jekyll-readme-index (0.2.0) - jekyll (~> 3.0) - jekyll-redirect-from (0.14.0) - jekyll (~> 3.3) - jekyll-relative-links (0.5.3) - jekyll (~> 3.3) - jekyll-remote-theme (0.3.1) - jekyll (~> 3.5) - rubyzip (>= 1.2.1, < 3.0) + jekyll-readme-index (0.3.0) + jekyll (>= 3.0, < 5.0) + jekyll-redirect-from (0.15.0) + jekyll (>= 3.3, < 5.0) + jekyll-relative-links (0.6.1) + jekyll (>= 3.3, < 5.0) + jekyll-remote-theme (0.4.1) + addressable (~> 2.0) + jekyll (>= 3.5, < 5.0) + rubyzip (>= 1.3.0) jekyll-sass-converter (1.5.2) sass (~> 3.4) - jekyll-seo-tag (2.5.0) - jekyll (~> 3.3) - jekyll-sitemap (1.2.0) - jekyll (~> 3.3) - jekyll-swiss (0.4.0) + jekyll-seo-tag (2.6.1) + jekyll (>= 3.3, < 5.0) + jekyll-sitemap (1.4.0) + jekyll (>= 3.7, < 5.0) + jekyll-swiss (1.0.0) jekyll-theme-architect (0.1.1) jekyll (~> 3.5) jekyll-seo-tag (~> 2.0) @@ -170,8 +170,8 @@ GEM jekyll-theme-modernist (0.1.1) jekyll (~> 3.5) jekyll-seo-tag (~> 2.0) - jekyll-theme-primer (0.5.3) - jekyll (~> 3.5) + jekyll-theme-primer (0.5.4) + jekyll (> 3.5, < 5.0) jekyll-github-metadata (~> 2.9) jekyll-seo-tag (~> 2.0) jekyll-theme-slate (0.1.1) @@ -183,60 +183,60 @@ GEM jekyll-theme-time-machine (0.1.1) jekyll (~> 3.5) jekyll-seo-tag (~> 2.0) - jekyll-titles-from-headings (0.5.1) - jekyll (~> 3.3) - jekyll-watch (2.0.0) + jekyll-titles-from-headings (0.5.3) + jekyll (>= 3.3, < 5.0) + jekyll-watch (2.2.1) listen (~> 3.0) - jemoji (0.10.1) + jemoji (0.11.1) gemoji (~> 3.0) html-pipeline (~> 2.2) - jekyll (~> 3.0) + jekyll (>= 3.0, < 5.0) kramdown (1.17.0) - liquid (4.0.0) - listen (3.1.5) - rb-fsevent (~> 0.9, >= 0.9.4) - rb-inotify (~> 0.9, >= 0.9.7) - ruby_dep (~> 1.2) + liquid (4.0.3) + listen (3.2.1) + rb-fsevent (~> 0.10, >= 0.10.3) + rb-inotify (~> 0.9, >= 0.9.10) mercenary (0.3.6) - mini_portile2 (2.3.0) - minima (2.5.0) - jekyll (~> 3.5) + mini_portile2 (2.4.0) + minima (2.5.1) + jekyll (>= 3.5, < 5.0) jekyll-feed (~> 0.9) jekyll-seo-tag (~> 2.1) - minitest (5.11.3) - multipart-post (2.0.0) - nokogiri (1.8.5) - mini_portile2 (~> 2.3.0) - octokit (4.12.0) + minitest (5.14.1) + multipart-post (2.1.1) + nokogiri (1.10.9) + mini_portile2 (~> 2.4.0) + octokit (4.18.0) + faraday (>= 0.9) sawyer (~> 0.8.0, >= 0.5.3) - pathutil (0.16.1) + pathutil (0.16.2) forwardable-extended (~> 2.6) - public_suffix (2.0.5) - rb-fsevent (0.10.3) - rb-inotify (0.9.10) - ffi (>= 0.5.0, < 2) - rouge (2.2.1) - ruby-enum (0.7.2) + public_suffix (3.1.1) + rb-fsevent (0.10.4) + rb-inotify (0.10.1) + ffi (~> 1.0) + rouge (3.13.0) + ruby-enum (0.8.0) i18n - ruby_dep (1.5.0) - rubyzip (1.2.2) - safe_yaml (1.0.4) - sass (3.6.0) + rubyzip (2.3.0) + safe_yaml (1.0.5) + sass (3.7.4) sass-listen (~> 4.0.0) sass-listen (4.0.0) rb-fsevent (~> 0.9, >= 0.9.4) rb-inotify (~> 0.9, >= 0.9.7) - sawyer (0.8.1) - addressable (>= 2.3.5, < 2.6) - faraday (~> 0.8, < 1.0) + sawyer (0.8.2) + addressable (>= 2.3.5) + faraday (> 0.8, < 2.0) terminal-table (1.8.0) unicode-display_width (~> 1.1, >= 1.1.1) thread_safe (0.3.6) - typhoeus (1.3.0) + typhoeus (1.4.0) ethon (>= 0.9.0) - tzinfo (1.2.5) + tzinfo (1.2.7) thread_safe (~> 0.1) - unicode-display_width (1.4.0) + unicode-display_width (1.7.0) + zeitwerk (2.3.0) PLATFORMS ruby From 5f96589f7acc6cecdc6ebfd4c9e5e227cc598911 Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Fri, 22 May 2020 09:32:58 -0700 Subject: [PATCH 06/31] ArrayCoerceExpr change --- gpdb-postgres-12-merge/notes.md | 1 + 1 file changed, 1 insertion(+) create mode 100644 gpdb-postgres-12-merge/notes.md diff --git a/gpdb-postgres-12-merge/notes.md b/gpdb-postgres-12-merge/notes.md new file mode 100644 index 0000000..f489838 --- /dev/null +++ b/gpdb-postgres-12-merge/notes.md @@ -0,0 +1 @@ +`ArrayCoerceExpr` changed in commit postgres/postgres@c12d570fa147d0ec273df53de3a2802925d551ba , moving `elemfuncid` and `isExplicit` into `elemexpr` From dd36e2e1ed3a07602f9ac3bb141791e3956c1a1f Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Wed, 2 Sep 2020 11:17:38 -0700 Subject: [PATCH 07/31] WIP 12 merge --- gpdb-postgres-12-merge/notes.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gpdb-postgres-12-merge/notes.md b/gpdb-postgres-12-merge/notes.md index f489838..a1b7c50 100644 --- a/gpdb-postgres-12-merge/notes.md +++ b/gpdb-postgres-12-merge/notes.md @@ -1 +1,3 @@ `ArrayCoerceExpr` changed in commit postgres/postgres@c12d570fa147d0ec273df53de3a2802925d551ba , moving `elemfuncid` and `isExplicit` into `elemexpr` + +`ArrayRef` renamed to `SubscriptingRef` in commit 558d77f20e4e9 From b536590c4299b81fa7da1db39ddbebfba8ac177f Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Sun, 22 Nov 2020 08:25:34 -0800 Subject: [PATCH 08/31] Remove values from terminology --- RefPtr.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/RefPtr.md b/RefPtr.md index 0359f4b..e5bdf8c 100644 --- a/RefPtr.md +++ b/RefPtr.md @@ -16,9 +16,6 @@ An _Owner_ is a piece of code (a block scope, a function, an object) that is res "Managing a pointee's lifetime" best manifests itself through through the owner calling `Release()` -## Values -A value refers to the "content" or meaning that occupies the address of an object. 42 is a value. The string `"hello world"` is a value. - # Examples ## Example of a function taking ownership of one of its parameters: From e9ea1aff9a600d786dead53ac72e01c5cd98b44c Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Sun, 22 Nov 2020 08:30:19 -0800 Subject: [PATCH 09/31] Add example of container owning contained elements --- RefPtr.md | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/RefPtr.md b/RefPtr.md index e5bdf8c..d9f4d0d 100644 --- a/RefPtr.md +++ b/RefPtr.md @@ -78,7 +78,8 @@ t->Release(); For example `CColRef::Pdrgpul` returns ownership. Similarly `CFunctionalDependency::PcrsKeys` returns ownership. -## Example of an object taking ownership: +## Example of an object owning a field: + ```cpp struct S { T* t_; @@ -91,6 +92,7 @@ struct S { } } ``` + We have 486 occurrences of `Release()` called on a member variable from destructors in ORCA. Often enough, ORCA also does a variant @@ -111,6 +113,16 @@ t->AddRef(); S* s = new S(t); ``` +## Example of an object owning through a container + +```cpp +struct S { + gpos::CDynamicPtrArray *many_t_; +}; +``` + +Here an `S` owns a dynamic pointer array of `T`, but `many_t_` owns each element stored in it. + # Parking Lot Questions: ### Question 1 From c0ef13df64132bf2e490b883d9b1a2a82470b625 Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Sun, 22 Nov 2020 09:11:51 -0800 Subject: [PATCH 10/31] Add notes about owning containers Also move pointer conversions earlier as it's simpler. --- RefPtr.md | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/RefPtr.md b/RefPtr.md index d9f4d0d..c779508 100644 --- a/RefPtr.md +++ b/RefPtr.md @@ -355,10 +355,15 @@ A local owner variable should not `AddRef()` before being returned as an owner. A member owner variable should `AddRef()` before being returned as an owner. # Conversion -Our vision would be to remove all the annotation once we have sufficient information, and the end result looks like: +Once every pointer becomes annotated as either `Pointer` or `Owner`, we'll have sufficient information about the intent. +We can then convert the code mechanically: 1. A human needs to inspect each unannotated pointer, and either manually annotate it, or come up with new base rules or new propagation rules +1. each non-owning variable `pointer t` is replaced with a raw pointer + 1. Assumption: `t` doesn't do `Release()` for managing lifetime + 1. Assumption: All calls to `t->AddRef()` is passing a pointer to an owner argument at function call + 1. each owner variable `owner t` is replaced with a (hypothetically named) smart pointer `RefPtr t` 1. All references of `t->Release()` should be removed: this happens automatically when `t` 1. goes out of function / block scope, or @@ -370,10 +375,9 @@ Our vision would be to remove all the annotation once we have sufficient informa 1. Open questions: come up with rules that identify `std::move(t)` when we pass `t` to another owner. Obviously this has to be the last time we copy it (_use_ it actually: use-after-move is UB) -1. each non-owning variable `pointer t` is replaced with a raw pointer - 1. Assumption: `t` doesn't do `Release()` for managing lifetime - 1. Assumption: All calls to `t->AddRef()` is passing a pointer to an owner argument at function call - +1. (TBD) special care is needed to convert various owning containers (e.g. `gpos::CDynamicPtrArray`) so that: + 1. The elements stored are of type `RefPtr` + 1. The container no longer manages the lifetype of the pointee any more than normal `construct` / `destroy` the `RefPtr` objects (not pointers) # Frequently Given Answers (FAQ) From 8d5d68a7814584db0b3f980e7f773fa5854ab4cf Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Sun, 22 Nov 2020 11:02:54 -0800 Subject: [PATCH 11/31] Correct the RefPtr sketch Also call out the design choice that RefPtr does not implicitly convert to a raw pointer. --- RefPtr.md | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/RefPtr.md b/RefPtr.md index c779508..5d77759 100644 --- a/RefPtr.md +++ b/RefPtr.md @@ -391,17 +391,18 @@ So this just sounds like extremely brittle and bad code that violates all sorts ## Implementation of `RefPtr` Here's a sketch: + ```cpp template struct RefPtr { RefPtr() = default; - // implicit conversion from T* + ~RefPtr() { if (p_) p_->Release(); } + RefPtr(T *p): p_(p) { if (p_) p_->AddRef(); } RefPtr(RefPtr &&other) : p_(other.p_) { other.p_ = nullptr; } - ~RefPtr() { if (p_) p_->Release(); } RefPtr(const RefPtr &other): RefPtr(other.p_) {} - friend void swap(RefPtr& other) { std::swap(p_, sink.p_); } + void swap(RefPtr& other) { std::swap(p_, other.p_); } // copy and move assignment RefPtr &operator=(RefPtr sink) { swap(sink); return *this; } @@ -418,6 +419,18 @@ template RefPtr allocate_ref(MP mp, Args&&... args) { return {GPOS_NEW(mp) T(args...);} } ``` +Note in the sketch above: +1. `RefPtr` is implicitly convertible from `T*`. +1. `RefPtr` is not implicitly convertible to a `T*`. + +Both choices are negotiable, given that we can work on the tooling for conversion. + +See [Chapter 7.7][implicit-conversion-to-raw-poiner-types] from Alexandrescu's +Modern C++ Design for a discussion of why we often should be judicious in +enabling implicit conversion to raw pointers from smart poiners. + +[implicit-conversion-to-raw-poiner-types]: https://www.informit.com/articles/article.aspx?p=31529&seqNum=7 + ## Circular dependency? Q: the above sketch seems to suggest that you need to provide a complete type `T` to `RefPtr`. Can we handle the situation when `T` is forward declared, From d6c1e197d66bfc5b3701ead0cbd45707de22924e Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Mon, 23 Nov 2020 08:04:59 -0800 Subject: [PATCH 12/31] Add quick note to justify `allocate_ref` to avoid naked new --- RefPtr.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/RefPtr.md b/RefPtr.md index 5d77759..9fd7ab6 100644 --- a/RefPtr.md +++ b/RefPtr.md @@ -415,6 +415,12 @@ private: }; // helper analogous to std::allocate_shared +// use this to mitigate problems in the following pattern: +// T* t = GPOS_NEW(mp) T(mp, GPOS_NEW(mp) R(mp, ...), GPOS_NEW(mp) S(mp, ...)) +// do this instead: +// RefPtr t = allocate_ref(mp, allocate_ref(...), allocate_ref(...)) +// Reference: GotW #56, GotW #102 + template RefPtr allocate_ref(MP mp, Args&&... args) { return {GPOS_NEW(mp) T(args...);} } ``` @@ -429,6 +435,10 @@ See [Chapter 7.7][implicit-conversion-to-raw-poiner-types] from Alexandrescu's Modern C++ Design for a discussion of why we often should be judicious in enabling implicit conversion to raw pointers from smart poiners. +See GotW #102 and GotW #56 for the reason why we want to avoid the pattern of +naked new expressions and use a helper like `gpos::allocate_ref` suggested at +the bottom of the sketch + [implicit-conversion-to-raw-poiner-types]: https://www.informit.com/articles/article.aspx?p=31529&seqNum=7 ## Circular dependency? From 1a71fb6d18831ff483a1564fa47f5dd6d20635cd Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Wed, 2 Dec 2020 15:55:15 -0800 Subject: [PATCH 13/31] New position on implicit conversion to raw pointer --- RefPtr.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/RefPtr.md b/RefPtr.md index 9fd7ab6..525b4bc 100644 --- a/RefPtr.md +++ b/RefPtr.md @@ -429,17 +429,19 @@ Note in the sketch above: 1. `RefPtr` is implicitly convertible from `T*`. 1. `RefPtr` is not implicitly convertible to a `T*`. -Both choices are negotiable, given that we can work on the tooling for conversion. +The first choice is negotiable, given that we can work on the tooling for conversion. See [Chapter 7.7][implicit-conversion-to-raw-poiner-types] from Alexandrescu's -Modern C++ Design for a discussion of why we often should be judicious in -enabling implicit conversion to raw pointers from smart poiners. +Modern C++ Design and [Herb Sutter's Reader Q&A][why-dont-smart-pointers] for a +discussion of why we often should be judicious in enabling implicit conversion +to raw pointers from smart poiners. See GotW #102 and GotW #56 for the reason why we want to avoid the pattern of naked new expressions and use a helper like `gpos::allocate_ref` suggested at the bottom of the sketch [implicit-conversion-to-raw-poiner-types]: https://www.informit.com/articles/article.aspx?p=31529&seqNum=7 +[why-dont-smart-pointers]: https://herbsutter.com/2012/06/21/reader-qa-why-dont-modern-smart-pointers-implicitly-convert-to/ ## Circular dependency? Q: the above sketch seems to suggest that you need to provide a complete type `T` to `RefPtr`. From 7f1aa628b0831da81966d809e2a97fcf5b3d1128 Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Wed, 2 Dec 2020 19:21:47 -0800 Subject: [PATCH 14/31] Container touch points --- RefPtr.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/RefPtr.md b/RefPtr.md index 525b4bc..6e28230 100644 --- a/RefPtr.md +++ b/RefPtr.md @@ -379,6 +379,25 @@ We can then convert the code mechanically: 1. The elements stored are of type `RefPtr` 1. The container no longer manages the lifetype of the pointee any more than normal `construct` / `destroy` the `RefPtr` objects (not pointers) + +# Observation on container touch poins + +## `CDynamicPtrArray` + +### Only used when `CleanupFn == gpos::CleanupNULL`: + +1. `AppendArray` + +1. `IndexOf` + +1. `IndexesOfSubsequence` + +1. `CreateReducedArray` + +### Not used with `CleanupRelease` + +1. `Find` + # Frequently Given Answers (FAQ) ## Weak Pointers? From fa4dba2e35208dba3a77705782571e9a92aa1f30 Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Wed, 2 Dec 2020 20:54:28 -0800 Subject: [PATCH 15/31] Consistently use "field" As opposed to "member variable" and other variations. --- RefPtr.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/RefPtr.md b/RefPtr.md index 6e28230..7691815 100644 --- a/RefPtr.md +++ b/RefPtr.md @@ -93,7 +93,7 @@ struct S { } ``` -We have 486 occurrences of `Release()` called on a member variable from destructors in ORCA. +We have 486 occurrences of `Release()` called on a field (non-static member variable) from destructors in ORCA. Often enough, ORCA also does a variant ```patch @@ -269,8 +269,8 @@ pointer foo(T *parm1, U parm2) { We have 242 occurrences of functions returning a parameter in ORCA `.cpp` files (and a lot more in headers). -## base.memOwnSafeRelease -A non-static field (data member) of a struct (or class) that is released in its destructor is an owner. i.e. when we match: +## base.fieldOwnSafeRelease +A field (data member) of a struct (or class) that is released in its destructor is an owner. i.e. when we match: ```cpp struct S { @@ -290,7 +290,7 @@ struct S { }; ``` -We have 324 (`SafeRelease`d, or 486 for `Release`) such member variables in ORCA `.cpp` files. +We have 324 (`SafeRelease`d, or 486 for `Release`) such fields in ORCA `.cpp` files. # Owner propagation Once we write out the annotation done in the base cases, we can further propagate the annotation. @@ -351,8 +351,8 @@ Idea: identify the possible "move" and annotate that. Once we annotate all moves ## val.ownLocalRet A local owner variable should not `AddRef()` before being returned as an owner. -## val.memRet -A member owner variable should `AddRef()` before being returned as an owner. +## val.fieldRet +An owner field should `AddRef()` before being returned as an owner. # Conversion Once every pointer becomes annotated as either `Pointer` or `Owner`, we'll have sufficient information about the intent. From 7cacc18f7812df76583245111c395c9dba582647 Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Mon, 7 Dec 2020 21:32:11 -0800 Subject: [PATCH 16/31] Add a tally for owner fields --- RefPtr.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/RefPtr.md b/RefPtr.md index 7691815..ba0004c 100644 --- a/RefPtr.md +++ b/RefPtr.md @@ -292,6 +292,12 @@ struct S { We have 324 (`SafeRelease`d, or 486 for `Release`) such fields in ORCA `.cpp` files. +| | gpopt | gporca | total +---|---|---|--- +Release | 11 | 528 | 539 +SafeRelease | 2 | 339 | 341 +total | 13 | 867 | 880 + # Owner propagation Once we write out the annotation done in the base cases, we can further propagate the annotation. We don't know how far we can get with one iteration (because the derivation is iterative / recursive), From b2364a852d0735ecb23a88086b99f8419986e16a Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Mon, 7 Dec 2020 22:21:01 -0800 Subject: [PATCH 17/31] Mention std::experimental::observer_ptr --- RefPtr.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/RefPtr.md b/RefPtr.md index ba0004c..3123247 100644 --- a/RefPtr.md +++ b/RefPtr.md @@ -11,6 +11,8 @@ but is never responsible for managing the lifetime of the pointee Pointers assume that the pointers they hold don't dangle (the pointees always outlive them). +In some vocabulary this is also called an [observer](https://en.cppreference.com/w/cpp/experimental/observer_ptr). + ## Owners An _Owner_ is a piece of code (a block scope, a function, an object) that is responsible for managing the lifetime of an object they reference through a pointer. From 6ee7d205f05d5de25219a4847f8d2a6300173fba Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Tue, 8 Dec 2020 20:06:49 -0800 Subject: [PATCH 18/31] Add base rule fieldPtr --- RefPtr.md | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/RefPtr.md b/RefPtr.md index 3123247..c0b54d0 100644 --- a/RefPtr.md +++ b/RefPtr.md @@ -300,6 +300,33 @@ Release | 11 | 528 | 539 SafeRelease | 2 | 339 | 341 total | 13 | 867 | 880 +## base.fieldPtr +A field that is never released in the destructor of its class is a pointer. i.e. when we match: + +```cpp +struct S { + T* t_; + const T* ct_; + + ~S() { // no releasing of t_ nor ct_ } +}; +``` + +We annotate: + +```cpp +struct S { + pointer t_; + pointer ct_; + + ~S() { // no releasing of t_ nor ct_ } +}; +``` + +gpopt | gporca | total +---|---|--- +2 | 128 | 130 + # Owner propagation Once we write out the annotation done in the base cases, we can further propagate the annotation. We don't know how far we can get with one iteration (because the derivation is iterative / recursive), From 656303a10446ad48db34bbbefda1099d12170770 Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Wed, 9 Dec 2020 20:39:29 -0800 Subject: [PATCH 19/31] Update tally now that we search through typedef --- RefPtr.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/RefPtr.md b/RefPtr.md index c0b54d0..b572e4b 100644 --- a/RefPtr.md +++ b/RefPtr.md @@ -296,9 +296,9 @@ We have 324 (`SafeRelease`d, or 486 for `Release`) such fields in ORCA `.cpp` fi | | gpopt | gporca | total ---|---|---|--- -Release | 11 | 528 | 539 -SafeRelease | 2 | 339 | 341 -total | 13 | 867 | 880 +Release | 11 | 537 | 548 +SafeRelease | 2 | 344 | 346 +total | 13 | 881 | 894 ## base.fieldPtr A field that is never released in the destructor of its class is a pointer. i.e. when we match: @@ -325,7 +325,7 @@ struct S { gpopt | gporca | total ---|---|--- -2 | 128 | 130 +7 | 165 | 172 # Owner propagation Once we write out the annotation done in the base cases, we can further propagate the annotation. From 0de9c7b6d412c6874b5a49afc9d3b8b4d97c7b93 Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Fri, 11 Dec 2020 18:18:00 -0800 Subject: [PATCH 20/31] Tally owner parameters found by base rule --- RefPtr.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/RefPtr.md b/RefPtr.md index b572e4b..95d0dd1 100644 --- a/RefPtr.md +++ b/RefPtr.md @@ -244,8 +244,15 @@ Hint: parameters that are unconditionally released are taking unnecessary owners We can optimize them away in a second pass. We have about 12 such superfluous owners. Maybe a pre-factoring to eliminate them? +gpopt | gporca | total +---|---|--- +2|85|87 + ## base.parmPtr -A function parameter that never has `Release` called on it is a pointer, i.e. we annotate it as + +**This is wrong** + +~~A function parameter that never has `Release` called on it is a pointer~~, i.e. we annotate it as ```cpp void PointsToParam(pointer t) { From 266d359123c285dc3d3104471345798ae002114b Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Fri, 11 Dec 2020 22:51:45 -0800 Subject: [PATCH 21/31] Tally for local var owner base rules --- RefPtr.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/RefPtr.md b/RefPtr.md index 95d0dd1..f17f9ed 100644 --- a/RefPtr.md +++ b/RefPtr.md @@ -191,6 +191,12 @@ owner t = new ...; We have 2288 such local variables. +Tally of variables initalized with new (excluding the ones that are released) + +gpopt | gporca | total +---|---|--- +249 | 1459 | 1708 + ## base.varOwnRelease A local variable that has `Release` member function called on it is an owner. i.e. when we match: @@ -221,6 +227,12 @@ A local variable that has the static function `SafeRelease` called on it is an o We have 72 such local variables in ORCA `.cpp` files. +Combined occurrences of local variables that are released: + +gpopt | gporca | total +---|---|--- +118 | 1615 | 1733 + ## base.varPtr Questionable: A local variable that is returned right after an `AddRef()` is a pointer. I don't remember what inspired this rule, and it takes a little more than clang-query to find those. From f1d2f2f0cc642d3b08fd45763e6fed5d829a6305 Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Sat, 12 Dec 2020 14:21:11 -0800 Subject: [PATCH 22/31] Add base rule of a function returning owner --- RefPtr.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/RefPtr.md b/RefPtr.md index f17f9ed..140e74f 100644 --- a/RefPtr.md +++ b/RefPtr.md @@ -271,6 +271,28 @@ void PointsToParam(pointer t) { } ``` +## base.retOwnNew + +A function whose return value is a "new" expression returns ownership. i.e. when we match: + +```cpp +T* foo() { + return new T; +} +``` + +we annotate + +```cpp +owner foo() { + return new T; +} +``` + +gpopt | gporca | total +---|---|--- +38 | 682 | 720 + ## base.retPtr This one is a little ORCA-specific: a function returning a parameter returns a pointer. i.e. when we match: From 01ee34b780faa4fbdb28735351225d761dd84b74 Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Mon, 21 Dec 2020 21:17:33 -0800 Subject: [PATCH 23/31] Remove two base rules that are incorrect. No wonder I couldn't articulate why, because they were wrong. --- RefPtr.md | 35 +++++++++++++---------------------- 1 file changed, 13 insertions(+), 22 deletions(-) diff --git a/RefPtr.md b/RefPtr.md index 140e74f..ec30ed0 100644 --- a/RefPtr.md +++ b/RefPtr.md @@ -264,13 +264,23 @@ gpopt | gporca | total **This is wrong** -~~A function parameter that never has `Release` called on it is a pointer~~, i.e. we annotate it as +~~A function parameter that never has `Release` called on it is a pointer~~ + +Insight: a function parameter that's never released can be either a pointer or an owner. + +If it's not a pointer, then its ownership must be transferred to another owner. + +For example: ```cpp -void PointsToParam(pointer t) { -} +bool bar(owner t2); + +bool foo(T* t) { return bar(t); } ``` +Here we can infer that parameter `t` of `foo` is an owner: it's never released, but its only uses are as arguments to an owner parameter (without `AddRef()`). +We should add a propagation rule for this. + ## base.retOwnNew A function whose return value is a "new" expression returns ownership. i.e. when we match: @@ -293,25 +303,6 @@ gpopt | gporca | total ---|---|--- 38 | 682 | 720 -## base.retPtr -This one is a little ORCA-specific: a function returning a parameter returns a pointer. i.e. when we match: - -```cpp -T *foo(T *parm1, U parm2) { - return parm1; -} -``` - -we annotate - -```cpp -pointer foo(T *parm1, U parm2) { - return parm1; -} -``` - -We have 242 occurrences of functions returning a parameter in ORCA `.cpp` files (and a lot more in headers). - ## base.fieldOwnSafeRelease A field (data member) of a struct (or class) that is released in its destructor is an owner. i.e. when we match: From 3e55b04f95e1a32601c40caf72311608e4bd5727 Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Mon, 11 Jan 2021 13:56:13 -0800 Subject: [PATCH 24/31] Starting a simple slide deck for RefCount refactor --- RefPtr.slides.md | 107 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 107 insertions(+) create mode 100644 RefPtr.slides.md diff --git a/RefPtr.slides.md b/RefPtr.slides.md new file mode 100644 index 0000000..cab7569 --- /dev/null +++ b/RefPtr.slides.md @@ -0,0 +1,107 @@ +--- +title: RefCount Refactoring +--- + +# RefCount Refactoring + +slide: https://hackmd.io/@j-/rJ8v-AVAD#/ + +--- + + +--- + +### Before: + +```cpp + struct T : gpos::CRefCount {}; + using U = T; + struct S : T {}; + + U *F(); + + U *foo(int i, bool b, S *param) { + U *u = F(); + if (i < 42) { + u->AddRef(); + return u; + } + param->AddRef(); + return param; + } + +``` + +---- + +### After + +```cpp + struct T : gpos::CRefCount {}; + using U = T; + struct S : T {}; + + U *F(); + + U *foo(int i, bool b, gpos::pointer param) { + gpos::pointer u = F(); + if (i < 42) { + u->AddRef(); + return u; + } + param->AddRef(); + return param; + } +``` + +--- + +### Before: + +```cpp + struct T : gpos::CRefCount {}; + using U = T; + struct S : T {}; + + U *F(); + + U *foo(int i, bool b, S *param) { + U *u = F(); + if (i < 42) { + u->AddRef(); + return u; + } + param->AddRef(); + return param; + } + +``` + +---- + +### After + +```cpp + struct T : gpos::CRefCount {}; + using U = T; + struct S : T {}; + + U *F(); + + Ref foo(int i, bool b, S * param) { + U * u = F(); + if (i < 42) { + return u; + } + return param; + } +``` + + +--- + +## What To Do Next + +* Incorporate a-priori knowlege of our templated data structures + 1. e.g. We know `CDynamicPtrArray::Append` takes an owner +* Output parameters: e.g. `CExpression**` => `owner*` From 2d165eb06a28d6dd0d6cf55730602929eb6251f8 Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Tue, 30 Mar 2021 15:56:29 -0700 Subject: [PATCH 25/31] whatever --- RefPtr.md | 33 ++++++++++++++ RefPtr.slides.md | 112 +++++++++++++++++++++++------------------------ 2 files changed, 87 insertions(+), 58 deletions(-) diff --git a/RefPtr.md b/RefPtr.md index ec30ed0..039fef0 100644 --- a/RefPtr.md +++ b/RefPtr.md @@ -402,6 +402,39 @@ owner g(); I hesitated to generalize the above rule as "a function returning an owner expression, well, returns an owner" because I wasn't sure about parameters. Fortunately ORCA never returns a parameter that has been `Release()`d. +## prop.vfun +If a virtual function returns an owner, then *all* functions it overrides returns an owner. i.e. when we match + +```cpp +struct T {}; + +struct S { +virtual T* foo(); +}; + +struct R : S { +owner foo() override; +} +``` + +We annotate + +```cpp +struct T {}; + +struct S { +virtual owner foo(); +}; + +struct R : S { +owner foo() override; +} +``` + +Conversely, if the "super" function `f` returns an owner, then any override of `f` returns an owner. + +Likewise, every function in the override set of a virtual function returns a pointer if one of them returns a pointer. + # Call site cross-examination Once the propagation converges to a stationary point, we should validate some of our assumptions. diff --git a/RefPtr.slides.md b/RefPtr.slides.md index cab7569..9036e62 100644 --- a/RefPtr.slides.md +++ b/RefPtr.slides.md @@ -14,21 +14,19 @@ slide: https://hackmd.io/@j-/rJ8v-AVAD#/ ### Before: ```cpp - struct T : gpos::CRefCount {}; - using U = T; - struct S : T {}; - - U *F(); - - U *foo(int i, bool b, S *param) { - U *u = F(); - if (i < 42) { - u->AddRef(); - return u; - } - param->AddRef(); - return param; - } +struct U : CRefCount {}; + +U *F(); + +U *foo(int i, bool b, U *param) { + U *u = F(); + if (i < 42) { + u->AddRef(); + return u; + } + param->AddRef(); + return param; +} ``` @@ -37,21 +35,19 @@ slide: https://hackmd.io/@j-/rJ8v-AVAD#/ ### After ```cpp - struct T : gpos::CRefCount {}; - using U = T; - struct S : T {}; - - U *F(); - - U *foo(int i, bool b, gpos::pointer param) { - gpos::pointer u = F(); - if (i < 42) { - u->AddRef(); - return u; - } - param->AddRef(); - return param; - } +struct U : CRefCount {}; + +U *F(); + +U *foo(int i, bool b, pointer param) { + pointer u = F(); + if (i < 42) { + u->AddRef(); + return u; + } + param->AddRef(); + return param; +} ``` --- @@ -59,21 +55,21 @@ slide: https://hackmd.io/@j-/rJ8v-AVAD#/ ### Before: ```cpp - struct T : gpos::CRefCount {}; - using U = T; - struct S : T {}; - - U *F(); - - U *foo(int i, bool b, S *param) { - U *u = F(); - if (i < 42) { - u->AddRef(); - return u; - } - param->AddRef(); - return param; - } +struct T : CRefCount {}; +using U = T; +struct S : T {}; + +U *F(); + +U *foo(int i, bool b, S *param) { + U *u = F(); + if (i < 42) { + u->AddRef(); + return u; + } + param->AddRef(); + return param; +} ``` @@ -82,19 +78,19 @@ slide: https://hackmd.io/@j-/rJ8v-AVAD#/ ### After ```cpp - struct T : gpos::CRefCount {}; - using U = T; - struct S : T {}; - - U *F(); - - Ref foo(int i, bool b, S * param) { - U * u = F(); - if (i < 42) { - return u; - } - return param; - } +struct T : CRefCount {}; +using U = T; +struct S : T {}; + +U *F(); + +Ref foo(int i, bool b, S *param) { + U *u = F(); + if (i < 42) { + return u; + } + return param; +} ``` From 0992cf88f65cf5bfa2803a1a1b839673e0a28b06 Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Tue, 30 Mar 2021 15:59:42 -0700 Subject: [PATCH 26/31] Bundle update --- Gemfile.lock | 109 +++++++++++++++++++++++++++++---------------------- 1 file changed, 63 insertions(+), 46 deletions(-) diff --git a/Gemfile.lock b/Gemfile.lock index a1f36f9..6e2172d 100644 --- a/Gemfile.lock +++ b/Gemfile.lock @@ -1,7 +1,7 @@ GEM remote: https://rubygems.org/ specs: - activesupport (6.0.3.1) + activesupport (6.0.3.6) concurrent-ruby (~> 1.0, >= 1.0.2) i18n (>= 0.7, < 2) minitest (~> 5.1) @@ -16,46 +16,49 @@ GEM colorator (1.1.0) commonmarker (0.17.13) ruby-enum (~> 0.5) - concurrent-ruby (1.1.6) - dnsruby (1.61.3) - addressable (~> 2.5) - em-websocket (0.5.1) + concurrent-ruby (1.1.8) + dnsruby (1.61.5) + simpleidn (~> 0.1) + em-websocket (0.5.2) eventmachine (>= 0.12.9) http_parser.rb (~> 0.6.0) ethon (0.12.0) ffi (>= 1.3.0) eventmachine (1.2.7) execjs (2.7.0) - faraday (1.0.1) + faraday (1.3.0) + faraday-net_http (~> 1.0) multipart-post (>= 1.2, < 3) - ffi (1.12.2) + ruby2_keywords + faraday-net_http (1.0.1) + ffi (1.15.0) forwardable-extended (2.6.0) gemoji (3.0.1) - github-pages (204) - github-pages-health-check (= 1.16.1) - jekyll (= 3.8.5) + github-pages (213) + github-pages-health-check (= 1.17.0) + jekyll (= 3.9.0) jekyll-avatar (= 0.7.0) jekyll-coffeescript (= 1.1.1) jekyll-commonmark-ghpages (= 0.1.6) jekyll-default-layout (= 0.1.4) - jekyll-feed (= 0.13.0) + jekyll-feed (= 0.15.1) jekyll-gist (= 1.5.0) jekyll-github-metadata (= 2.13.0) - jekyll-mentions (= 1.5.1) + jekyll-mentions (= 1.6.0) jekyll-optional-front-matter (= 0.3.2) jekyll-paginate (= 1.1.0) jekyll-readme-index (= 0.3.0) - jekyll-redirect-from (= 0.15.0) + jekyll-redirect-from (= 0.16.0) jekyll-relative-links (= 0.6.1) - jekyll-remote-theme (= 0.4.1) + jekyll-remote-theme (= 0.4.3) jekyll-sass-converter (= 1.5.2) - jekyll-seo-tag (= 2.6.1) + jekyll-seo-tag (= 2.7.1) jekyll-sitemap (= 1.4.0) jekyll-swiss (= 1.0.0) jekyll-theme-architect (= 0.1.1) jekyll-theme-cayman (= 0.1.1) jekyll-theme-dinky (= 0.1.1) - jekyll-theme-hacker (= 0.1.1) + jekyll-theme-hacker (= 0.1.2) jekyll-theme-leap-day (= 0.1.1) jekyll-theme-merlot (= 0.1.1) jekyll-theme-midnight (= 0.1.1) @@ -66,34 +69,35 @@ GEM jekyll-theme-tactile (= 0.1.1) jekyll-theme-time-machine (= 0.1.1) jekyll-titles-from-headings (= 0.5.3) - jemoji (= 0.11.1) - kramdown (= 1.17.0) + jemoji (= 0.12.0) + kramdown (= 2.3.0) + kramdown-parser-gfm (= 1.1.0) liquid (= 4.0.3) mercenary (~> 0.3) minima (= 2.5.1) nokogiri (>= 1.10.4, < 2.0) - rouge (= 3.13.0) + rouge (= 3.26.0) terminal-table (~> 1.4) - github-pages-health-check (1.16.1) + github-pages-health-check (1.17.0) addressable (~> 2.3) dnsruby (~> 1.60) octokit (~> 4.0) - public_suffix (~> 3.0) + public_suffix (>= 2.0.2, < 5.0) typhoeus (~> 1.3) - html-pipeline (2.12.3) + html-pipeline (2.14.0) activesupport (>= 2) nokogiri (>= 1.4) http_parser.rb (0.6.0) i18n (0.9.5) concurrent-ruby (~> 1.0) - jekyll (3.8.5) + jekyll (3.9.0) addressable (~> 2.4) colorator (~> 1.0) em-websocket (~> 0.5) i18n (~> 0.7) jekyll-sass-converter (~> 1.0) jekyll-watch (~> 2.0) - kramdown (~> 1.14) + kramdown (>= 1.17, < 3) liquid (~> 4.0) mercenary (~> 0.3.3) pathutil (~> 0.9) @@ -113,14 +117,14 @@ GEM rouge (>= 2.0, < 4.0) jekyll-default-layout (0.1.4) jekyll (~> 3.0) - jekyll-feed (0.13.0) + jekyll-feed (0.15.1) jekyll (>= 3.7, < 5.0) jekyll-gist (1.5.0) octokit (~> 4.2) jekyll-github-metadata (2.13.0) jekyll (>= 3.4, < 5.0) octokit (~> 4.0, != 4.4.0) - jekyll-mentions (1.5.1) + jekyll-mentions (1.6.0) html-pipeline (~> 2.3) jekyll (>= 3.7, < 5.0) jekyll-optional-front-matter (0.3.2) @@ -128,18 +132,19 @@ GEM jekyll-paginate (1.1.0) jekyll-readme-index (0.3.0) jekyll (>= 3.0, < 5.0) - jekyll-redirect-from (0.15.0) + jekyll-redirect-from (0.16.0) jekyll (>= 3.3, < 5.0) jekyll-relative-links (0.6.1) jekyll (>= 3.3, < 5.0) - jekyll-remote-theme (0.4.1) + jekyll-remote-theme (0.4.3) addressable (~> 2.0) jekyll (>= 3.5, < 5.0) - rubyzip (>= 1.3.0) + jekyll-sass-converter (>= 1.0, <= 3.0.0, != 2.0.0) + rubyzip (>= 1.3.0, < 3.0) jekyll-sass-converter (1.5.2) sass (~> 3.4) - jekyll-seo-tag (2.6.1) - jekyll (>= 3.3, < 5.0) + jekyll-seo-tag (2.7.1) + jekyll (>= 3.8, < 5.0) jekyll-sitemap (1.4.0) jekyll (>= 3.7, < 5.0) jekyll-swiss (1.0.0) @@ -152,8 +157,8 @@ GEM jekyll-theme-dinky (0.1.1) jekyll (~> 3.5) jekyll-seo-tag (~> 2.0) - jekyll-theme-hacker (0.1.1) - jekyll (~> 3.5) + jekyll-theme-hacker (0.1.2) + jekyll (> 3.5, < 5.0) jekyll-seo-tag (~> 2.0) jekyll-theme-leap-day (0.1.1) jekyll (~> 3.5) @@ -187,37 +192,44 @@ GEM jekyll (>= 3.3, < 5.0) jekyll-watch (2.2.1) listen (~> 3.0) - jemoji (0.11.1) + jemoji (0.12.0) gemoji (~> 3.0) html-pipeline (~> 2.2) jekyll (>= 3.0, < 5.0) - kramdown (1.17.0) + kramdown (2.3.0) + rexml + kramdown-parser-gfm (1.1.0) + kramdown (~> 2.0) liquid (4.0.3) - listen (3.2.1) + listen (3.5.1) rb-fsevent (~> 0.10, >= 0.10.3) rb-inotify (~> 0.9, >= 0.9.10) mercenary (0.3.6) - mini_portile2 (2.4.0) + mini_portile2 (2.5.0) minima (2.5.1) jekyll (>= 3.5, < 5.0) jekyll-feed (~> 0.9) jekyll-seo-tag (~> 2.1) - minitest (5.14.1) + minitest (5.14.4) multipart-post (2.1.1) - nokogiri (1.10.9) - mini_portile2 (~> 2.4.0) - octokit (4.18.0) + nokogiri (1.11.2) + mini_portile2 (~> 2.5.0) + racc (~> 1.4) + octokit (4.20.0) faraday (>= 0.9) sawyer (~> 0.8.0, >= 0.5.3) pathutil (0.16.2) forwardable-extended (~> 2.6) - public_suffix (3.1.1) + public_suffix (4.0.6) + racc (1.5.2) rb-fsevent (0.10.4) rb-inotify (0.10.1) ffi (~> 1.0) - rouge (3.13.0) - ruby-enum (0.8.0) + rexml (3.2.4) + rouge (3.26.0) + ruby-enum (0.9.0) i18n + ruby2_keywords (0.0.4) rubyzip (2.3.0) safe_yaml (1.0.5) sass (3.7.4) @@ -228,15 +240,20 @@ GEM sawyer (0.8.2) addressable (>= 2.3.5) faraday (> 0.8, < 2.0) + simpleidn (0.2.1) + unf (~> 0.1.4) terminal-table (1.8.0) unicode-display_width (~> 1.1, >= 1.1.1) thread_safe (0.3.6) typhoeus (1.4.0) ethon (>= 0.9.0) - tzinfo (1.2.7) + tzinfo (1.2.9) thread_safe (~> 0.1) + unf (0.1.4) + unf_ext + unf_ext (0.0.7.7) unicode-display_width (1.7.0) - zeitwerk (2.3.0) + zeitwerk (2.4.2) PLATFORMS ruby From c7827e1153e304aad53335463cfe7120972f6ea8 Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Tue, 30 Mar 2021 17:35:58 -0700 Subject: [PATCH 27/31] Set membership design sketch --- SetMembershipHint.md | 75 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 75 insertions(+) create mode 100644 SetMembershipHint.md diff --git a/SetMembershipHint.md b/SetMembershipHint.md new file mode 100644 index 0000000..715752c --- /dev/null +++ b/SetMembershipHint.md @@ -0,0 +1,75 @@ +## Motivation + +Set up: +```sql +CREATE TABLE foo (a int, b int); +CREATE TABLE bar (c int, d int); +CREATE TABLE baz (e int, f int); +``` + +```sql +EXPLAIN (COSTS OFF) +SELECT +FROM foo JOIN bar ON b = c; +``` + +Without loss of generality, this is an expected plan: + +``` + Gather Motion 3:1 (slice1; segments: 3) + -> Hash Join + Hash Cond: (foo.b = bar.c) + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: foo.b + -> Seq Scan on foo + -> Hash + -> Seq Scan on bar +``` + +## A Modest Proposal + +1. A motion receiver can send a message out-of-band to all its sender peers, + with necessary information to describe a set membership test. The semantics + of this message is: "here's the subset of things I care about, send me + everything in here". + +1. This is a hint, which means that it can tolerate one-sided errors: false + positives are acceptable, false negatives are not. (Plain speak: it's OK to + give me more than what I ask for; it's not OK to miss anything I want). + +1. This happens mid-flight, which means that the receiver keeps whatever was + already in the receiving buffer, and the sender doesn't have to do "rewind". + +1. There needs to be a new API in the executor alongside `ExecProcNode` so that + `HashJoin` can pass down the hint to lower level executor nodes. + +## Open Questions + +We should be able to "combine" multiple hints, as in the following case: + +```sql +EXPLAIN (COSTS OFF) +SELECT +FROM foo JOIN bar ON b = c JOIN baz ON f = c; +``` + +With a general plan shape of: + +``` + Gather Motion 3:1 (slice1; segments: 3) + -> Hash Join (Jacob) + Hash Cond: (bar.d = baz.e) + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: bar.d + -> Hash Join (Joseph) + Hash Cond: (foo.b = bar.c) + -> Redistribute Motion 3:3 (slice3; segments: 3) + Hash Key: foo.b + -> Seq Scan on foo + -> Hash + -> Seq Scan on bar + -> Hash + -> Seq Scan on baz +``` + +Here we see that the Join atop `bar` "Joseph" From abe69cc80ced263dd406781a7177e22023c33f13 Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Tue, 30 Mar 2021 18:08:38 -0700 Subject: [PATCH 28/31] Add title --- SetMembershipHint.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/SetMembershipHint.md b/SetMembershipHint.md index 715752c..6e8b0b1 100644 --- a/SetMembershipHint.md +++ b/SetMembershipHint.md @@ -1,3 +1,5 @@ +# Set Membership Hints For Motions + ## Motivation Set up: From f1cba894be20f8d22b15cc91c63ab4c39e7ac5aa Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Thu, 1 Apr 2021 10:25:10 -0700 Subject: [PATCH 29/31] Polish the comebine example, and more --- SetMembershipHint.md | 70 ++++++++++++++++++++++++++++++++++++-------- 1 file changed, 58 insertions(+), 12 deletions(-) diff --git a/SetMembershipHint.md b/SetMembershipHint.md index 6e8b0b1..5b23d51 100644 --- a/SetMembershipHint.md +++ b/SetMembershipHint.md @@ -45,6 +45,40 @@ Without loss of generality, this is an expected plan: 1. There needs to be a new API in the executor alongside `ExecProcNode` so that `HashJoin` can pass down the hint to lower level executor nodes. + 1. Most nodes just pass the request down, verbatim + + 1. Motion receiver will act upon the request by actually messaging all its + senders + + 1. Shared Scan / Material / Sort should drop this on the floor + +## FAQ + +### Should this be a pure-executor thing? Why is the optimizers involved? + +Like most optimizations, this is not a no-brainer (it has non-trivial +overhead). Generating the digest from hash table takes time on the receiver +side, but more significantly, this adds a per-tuple overhead on the motion +sender side. It is the optimizer's job to decide whether the gain is worth the +overhead. + +### We can push the set-membership filter below motion senders, right? + +Not Really. + +1. The filter is meant to be *per-receiver*, so the motion sender is in the + best position to maintain these per-receiver information. + +1. Adding to the *per-receiver* point, if this is a broadcast motion, a random + motion, or a gather motion, the only node in the slice that knows where a + tuple is going would be the top node (the motion sender) + +1. The plan node for the motion node will also need additional hash opclasses + (or directly the hashfn oid I guess). Having this filtering below motion + means we need to proliferate this everywhere (let alone the iffiness of + non-redistribute motion) + + ## Open Questions We should be able to "combine" multiple hints, as in the following case: @@ -52,7 +86,7 @@ We should be able to "combine" multiple hints, as in the following case: ```sql EXPLAIN (COSTS OFF) SELECT -FROM foo JOIN bar ON b = c JOIN baz ON f = c; +FROM foo JOIN bar ON b = c JOIN baz ON e = c; ``` With a general plan shape of: @@ -60,18 +94,30 @@ With a general plan shape of: ``` Gather Motion 3:1 (slice1; segments: 3) -> Hash Join (Jacob) - Hash Cond: (bar.d = baz.e) - -> Redistribute Motion 3:3 (slice2; segments: 3) - Hash Key: bar.d - -> Hash Join (Joseph) - Hash Cond: (foo.b = bar.c) - -> Redistribute Motion 3:3 (slice3; segments: 3) - Hash Key: foo.b - -> Seq Scan on foo - -> Hash - -> Seq Scan on bar + Hash Cond: (bar.c = baz.e) + -> Hash Join (Joseph) + Hash Cond: (foo.b = bar.c) + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: foo.b + -> Seq Scan on foo + -> Hash + -> Seq Scan on bar -> Hash -> Seq Scan on baz ``` -Here we see that the Join atop `bar` "Joseph" +1. Here we see that the Join atop `bar` "Joseph" will hint at its outer child + with a set membership filter generated from the contents of `bar` + +1. Also notice "Joseph" have received an earlier hint from the Hash Join above + it, "Jacob", generated from the contents of scanning `baz` + +1. Basic idea, most nodes won't "remember" the hints they received before, they + just do the normal pass-down-until-it-hits-motion-receiver thing. But the + motion sender will be in a position to combine these filters. + +1. This adds a requirement that the set membership needs to have a + representation that is conducive to merging. (For bloom filters with + parameters `(n,k)`, we need to fix both of them to be able to quickly merge; + for a simple power-of-two hash bits representation, we need to fix the + power-of-two) From a5c89ee679f1c0c213ffb186889ab4cc7be531c5 Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Mon, 3 May 2021 14:54:45 -0700 Subject: [PATCH 30/31] WIP --- RefPtr.slides.md | 535 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 504 insertions(+), 31 deletions(-) diff --git a/RefPtr.slides.md b/RefPtr.slides.md index 9036e62..4beac71 100644 --- a/RefPtr.slides.md +++ b/RefPtr.slides.md @@ -1,5 +1,9 @@ --- title: RefCount Refactoring + +slideOptions: + spotlight: + enabled: true --- # RefCount Refactoring @@ -11,10 +15,274 @@ slide: https://hackmd.io/@j-/rJ8v-AVAD#/ --- -### Before: +# Intrusive RefCount + +```cpp +template +class CRefCount { + long refs_{1}; + + public: + CRefCount(const CRefCount&) = delete; + void AddRef() { ++refs_; } + void Release() { + if (--refs_ == 0) delete static_cast(this); + } +}; +``` + +---- + +## Derived + +```cpp +class CBitSet : public CRefCount {}; +``` + +---- + +## Pros + +1. Avoid the "split control block" pitfall of `shared_ptr` +1. Optimization for some particular workloads + +---- + +## Cons: + +1. Losing `const`. (Compared to the idiom of `shared_ptr`) +1. Losing `weak_ptr`. (Have to avoid circularity) + + +--- + +# Manual Reference Counting: + +The explict calls to `AddRef` and `Release` at the right spot to manage object lifetime. + +---- + +## Example 1 + +```cpp +COrderSpec *CPhysicalIndexOnlyScan::PosDerive( + CMemoryPool *CExpressionHandle &) const override { + m_pos->AddRef(); + return m_pos; +} +``` + +```cpp +CPhysicalIndexOnlyScan::~CPhysicalIndexOnlyScan() { + m_pindexdesc->Release(); + m_pos->Release(); +} +``` + + +---- + +## Example 2 + + + +```cpp +pexprOuter->AddRef(); +pexprInner->AddRef(); +pexprPred->AddRef(); +CExpression *pexprResult = + new CExpression(new TJoin(), pexprOuter, pexprInner, pexprPred); + +// add alternative to results +pxfres->Add(pexprResult); +``` + +---- + +## Pros + +1. No magic, what you see is what you get +1. "Drive stick": allows for micro-optimization, pre- C++ 11 + +---- + +## Cons + +1. Manual manipulation of reference count permeates the code base +1. Lifetime manipulation becomes part of the API + * local variables + * return value of functions + * function parameters + * constructor initializers + * class data members + * elements in a container / array +1. Extremely hard to reason about locally + +---- + +## Cons (ex) + +We want these + + +```cpp +vector bitsets_; +unordered_map plans_; +``` + +We get those + +```cpp +CDynamicPtrArray *bitsets_; +CHashMap *hm_; +``` + +--- + +# Vision + +1. Smart pointer that provide ownership semantics +1. Express ownership in code +1. Automatically handle lifecycle +1. `std::move` for micro optimization + + +---- + +## Instead of + + +```cpp! +class U { + CExpression* m_pexpr; + void SetExpr(CGroup* pexpr) { + if (m_pexpr) m_pexpr->Release(); + m_pexpr = pexpr; + } + + U(CExpression* pexpr) : m_pexpr(pexpr) {} + ~U() { + if (m_pexpr) m_pexpr->Release(); + } +}; +``` + +Client code + +```cpp +U* bar(CExpression* pexpr) { + pexpr->AddRef(); + return new U(pexpr); +} +void foo(U* u, CGroup* pgroup) { + pgroup->AddRef(); + u->SetGroup(pgroup); +} +``` + + +---- + +## Ideally... + +Destructor is gone! `AddRef` is gone. + +```cpp! +class U { + Ref m_pexpr; + void SetExpr(Ref pexpr) { m_pexpr = pexpr; } + + U(Ref pexpr) : m_pexpr(pexpr) {} +}; +``` + +Client code + +```cpp +Ref bar(Ref pexpr) { return new U(pexpr); } +void foo(U* u, CGroup* pgroup) { u->SetGroup(pgroup); } +``` + + + +---- + + +## Optimization + +For the performance-minded... + +```cpp! +class U { + Ref m_pexpr; + void SetExpr(Ref pexpr) { m_pexpr = std::move(pexpr); } + + U(Ref pexpr) : m_pexpr(std::move(pexpr)) {} +}; +``` + +Client code + +```cpp +Ref bar(Ref pexpr) { return new U(std::move(pexpr)); } +void foo(U* u, CGroup* pgroup) { u->SetGroup(pgroup); } +``` + +--- + +# Road map + +1. "Semantic marker" / annotation +1. local reasoning +1. One-shot conversion guided by annotations + +---- + +## Annotation + +Helper tags: +```cpp +template +using owner = T; +template +using pointer = T; +``` + +To apply annotation on: +```cpp +U* foo(int, U*); +``` + +We might get +```cpp +owner foo(int, pointer); +``` + +---- + + +## Conversion + +* We "just" need to get every one of the following (reference-counted) annotated + * variable; + * function return type; + * function parameter; + * class data member + +---- + +* We can then perform a conversion of the annotated code: + * substitute a raw pointer `T*` for `pointer` + * substitute a smart pointer `Ref` for `owner` + * remove all `Release` and `AddRef` calls + + +--- + +## Before: ```cpp -struct U : CRefCount {}; +struct U : CRefCount {}; U *F(); @@ -27,19 +295,19 @@ U *foo(int i, bool b, U *param) { param->AddRef(); return param; } - ``` + ---- -### After +## Annotate ```cpp -struct U : CRefCount {}; +struct U : CRefCount {}; U *F(); -U *foo(int i, bool b, pointer param) { +owner foo(int i, bool b, pointer param) { pointer u = F(); if (i < 42) { u->AddRef(); @@ -50,54 +318,259 @@ U *foo(int i, bool b, pointer param) { } ``` ---- +---- -### Before: +## Convert ```cpp -struct T : CRefCount {}; -using U = T; -struct S : T {}; - +struct U : CRefCount {}; U *F(); - -U *foo(int i, bool b, S *param) { +Ref foo(int i, bool b, U *param) { U *u = F(); if (i < 42) { - u->AddRef(); return u; } - param->AddRef(); return param; } +``` + +--- + +# Tooling + +* link to LLVM / Clang for access to the AST (abstract syntax tree) +* simple rules for local reasoning (examples) +* generate edits to source file + +---- +## Example rule + +* A field (data member) released in destructor is an owner. + +When we match +```cpp +struct R { + T* t; + ~R() { t->Release(); } +}; +``` + +We annotate +```cpp +struct R { + owner t; + ~R() { t->Release(); } +}; ``` ---- -### After +## Local reasoning + +* We only need to look at code from one translation unit +* Often we only need to look at code around one variable, or within one function +* This is not only good for tooling, but it's also better for humans + +---- + +## Bonus: propagation rule example + +* Propagation rule: a rule that matches not only code pattern but existing annotation +* e.g. virtual functions share the same "ownership" signature (return type, parameter) + +---- + +## Bonus example (Contd) + + + + + + + + + + + + +
Before propagationAfter One iteration of propagation
+ +```cpp! +struct Q { + virtual U* foo(); + virtual U* bar(); +}; + +struct R : Q { + owner foo() override; + pointer bar() override; +}; +``` + + ```cpp -struct T : CRefCount {}; -using U = T; -struct S : T {}; +struct Q { + virtual owner foo(); + virtual pointer bar(); +}; -U *F(); +struct R : Q { + owner foo() override; + pointer bar() override; +}; +``` -Ref foo(int i, bool b, S *param) { - U *u = F(); - if (i < 42) { - return u; +
+ +--- + +# Scale + +In a half-a-million LOC code base, our tool changed around 35K lines of code. + +``` + git diff --shortstat + 1682 files changed, 33104 insertions(+), 28274 deletions(-) +``` + +--- + +# Bonus + +* How to identify `std::move` (CFG) +* Future work (DFA) + +--- + +# Identifying `move` opportunities + +* Observation: an owner can be moved on its definite last use +* Construct a control flow graph (CFG) +* Find last use in the basic block (BB) immediately before the exit block + + + +Before + +```cpp +bool F(gpos::owner); + +bool bar(T* t) { return F(t); } +``` + +After + +```cpp +bool F(gpos::owner); + +bool bar(owner t) { return F(std::move(t)); } +``` + +--- + +# We can do better than CFG + + + + + + + + + + + +
+Given + +We want +
+ +```cpp +bool F(gpos::owner); + +bool bar(T* t, int x) { + if (F(t)) { + return x < 42; + } else { + return x > 420; } - return param; } ``` + ---- +```cpp +bool F(gpos::owner); + +bool bar(owner t, int x) { + if (F(std::move(t))) { + return x < 42; + } else { + return x > 420; + } +} +``` + +
+ +---- + +## Data Flow Analysis + +* What if the definite last use occurs too early and there are branches after it? We lose an opportunity to move. + +* We can taylor the CFG for each variable, contracting nodes that don't affect the usage of a variable, simplifying the graph + +---- + + + + + + + + + + + +
+Given + +CFG +
+ +```cpp +bool F(gpos::owner); + +bool bar(T* t, int x) { + if (F(t)) { + return x < 42; + } else { + return x > 420; + } +} +``` + + + +* B0 (EXIT) Preds: [1,2] + +* B1 Succs: 0 Preds: 3 + + 1. `return x < 42` + +* B2 Succs: 0 Preds: 3 + + 1. `return x > 420` + +* B3 Succs: [1,2] Preds: 4 + + 1. F(t) + 1. Terminal: if (F(t)) -## What To Do Next +* B4 (ENTRY) Succs: 3 -* Incorporate a-priori knowlege of our templated data structures - 1. e.g. We know `CDynamicPtrArray::Append` takes an owner -* Output parameters: e.g. `CExpression**` => `owner*` +
From 2d350579cdc338392aed3db03e2b85ae2f2e634a Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Sun, 20 Jun 2021 21:16:42 -0700 Subject: [PATCH 31/31] [slides] Reference counting refactor --- RefPtr.slides.md | 58 +++++++++++++++++++++++++++++++++++------------- 1 file changed, 43 insertions(+), 15 deletions(-) diff --git a/RefPtr.slides.md b/RefPtr.slides.md index 4beac71..ac2f05a 100644 --- a/RefPtr.slides.md +++ b/RefPtr.slides.md @@ -527,7 +527,7 @@ bool bar(owner t, int x) { ---- - +
- - + + + - - 1. `return x > 420` +
Given @@ -535,10 +535,15 @@ Given CFG
+Simp. for `t` + +
+ ```cpp bool F(gpos::owner); @@ -552,24 +557,47 @@ bool bar(T* t, int x) { ``` + -* B0 (EXIT) Preds: [1,2] - -* B1 Succs: 0 Preds: 3 - - 1. `return x < 42` +``` +[B0 (EXIT)] + Preds: [1,2] + +[B1] + 1: `return x < 42` + Succs: 0 + Preds: 3 + +[B2] + 1: `return x > 420` + Succs: 0 + Preds: 3 + +[B3] + 1: F(t) + Terminal: if [B3.1] + Succs: [1,2] + Preds: 4 + +[B4 (ENTRY)] Succs: 3 +``` -* B2 Succs: 0 Preds: 3 + -* B3 Succs: [1,2] Preds: 4 +``` +[B0 (EXIT)] + Preds: [1,2] - 1. F(t) - 1. Terminal: if (F(t)) +[B3] + 1: F(t) + Terminal: if [B3.1] + Succs: [0,0] + Preds: 4 -* B4 (ENTRY) Succs: 3 +[B4 (ENTRY)] Succs: 3 +```