How Postgres Stores Updates and Deletes

In the previous section, we learned where PG stores data. We also learned how databases and tables are broken up using an internal OID to name mapping so that all of the file and directory names are simply integers on disk.

We’re going to continue with our example and show how updates and deletes are written to the file.

Prerequisites

At this point, if you don’t have a sample database called millis with a sample table named matter, go back to the last section. You will also need to know the oid2name mapping to view the correct file via hexdump. The previous section will give you the instructions needed to follow along.

Updates

Returning to the example in the previous section, we ran hexdump on the “matter” table using the command hexdump -C 24638. Your filename (i.e. 24638) will be different.

This will look something like:

00000000  00 00 00 00 58 98 b2 01  00 00 00 00 1c 00 b0 1f  |....X...........|
00000010  00 20 04 20 00 00 00 00  b0 9f 98 00 00 00 00 00  |. . ............|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001fb0  64 10 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |d...............|
00001fc0  01 00 04 00 02 09 18 00  01 00 00 00 0d 42 72 69  |.............Bri|
00001fd0  61 6e 1f 53 61 6c 74 20  4c 61 6b 65 20 43 69 74  |an.Salt Lake Cit|
00001fe0  79 37 61 62 63 64 65 66  67 68 69 6a 6b 6c 6d 6e  |y7abcdefghijklmn|
00001ff0  6f 70 71 72 73 74 75 76  77 78 79 7a 00 00 00 00  |opqrstuvwxyz....|
00002000

I’m not going to pretend to know what the value of every bit means, and these could be different from version to version. However, by performing some basic actions, we can look at how the data in this file changes.

Let’s insert another row and force a checkpoint.

INSERT INTO matter (name, city, info) VALUES ('Robert', 'Paris', 'some info'); and CHECKPOINT;

Then display the contents again by running hexdump -C 24638. Now the file looks like this:

00000000  00 00 00 00 38 9b b2 01  00 00 00 00 20 00 78 1f  |....8....... .x.|
00000010  00 20 04 20 00 00 00 00  b0 9f 98 00 78 9f 66 00  |. . ........x.f.|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001f70  00 00 00 00 00 00 00 00  65 10 00 00 00 00 00 00  |........e.......|
00001f80  00 00 00 00 00 00 00 00  02 00 04 00 02 08 18 00  |................|
00001f90  02 00 00 00 0f 52 6f 62  65 72 74 0d 50 61 72 69  |.....Robert.Pari|
00001fa0  73 15 73 6f 6d 65 20 69  6e 66 6f 00 00 00 00 00  |s.some info.....|
00001fb0  64 10 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |d...............|
00001fc0  01 00 04 00 02 09 18 00  01 00 00 00 0d 42 72 69  |.............Bri|
00001fd0  61 6e 1f 53 61 6c 74 20  4c 61 6b 65 20 43 69 74  |an.Salt Lake Cit|
00001fe0  79 37 61 62 63 64 65 66  67 68 69 6a 6b 6c 6d 6e  |y7abcdefghijklmn|
00001ff0  6f 70 71 72 73 74 75 76  77 78 79 7a 00 00 00 00  |opqrstuvwxyz....|
00002000

We can see Robert’s record is now listed. We can also see that it was inserted in the same page because the data from both rows don’t exceed 8KB. This can also be deduced because both tuples are between the 00000000 and 00002000 line numbers.

Let’s insert one more row.

INSERT INTO matter (name, city, info) VALUES ('James', 'Bountiful', 'important stuff'); and CHECKPOINT;

This gives us:

00000000  00 00 00 00 58 9e b2 01  00 00 00 00 24 00 38 1f  |....X.......$.8.|
00000010  00 20 04 20 00 00 00 00  b0 9f 98 00 78 9f 66 00  |. . ........x.f.|
00000020  38 9f 78 00 00 00 00 00  00 00 00 00 00 00 00 00  |8.x.............|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001f30  00 00 00 00 00 00 00 00  66 10 00 00 00 00 00 00  |........f.......|
00001f40  00 00 00 00 00 00 00 00  03 00 04 00 02 08 18 00  |................|
00001f50  03 00 00 00 0d 4a 61 6d  65 73 15 42 6f 75 6e 74  |.....James.Bount|
00001f60  69 66 75 6c 21 69 6d 70  6f 72 74 61 6e 74 20 73  |iful!important s|
00001f70  74 75 66 66 00 00 00 00  65 10 00 00 00 00 00 00  |tuff....e.......|
00001f80  00 00 00 00 00 00 00 00  02 00 04 00 02 08 18 00  |................|
00001f90  02 00 00 00 0f 52 6f 62  65 72 74 0d 50 61 72 69  |.....Robert.Pari|
00001fa0  73 15 73 6f 6d 65 20 69  6e 66 6f 00 00 00 00 00  |s.some info.....|
00001fb0  64 10 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |d...............|
00001fc0  01 00 04 00 02 09 18 00  01 00 00 00 0d 42 72 69  |.............Bri|
00001fd0  61 6e 1f 53 61 6c 74 20  4c 61 6b 65 20 43 69 74  |an.Salt Lake Cit|
00001fe0  79 37 61 62 63 64 65 66  67 68 69 6a 6b 6c 6d 6e  |y7abcdefghijklmn|
00001ff0  6f 70 71 72 73 74 75 76  77 78 79 7a 00 00 00 00  |opqrstuvwxyz....|
00002000

And selecting this data from the database gives us the info we expect:

millis=# select * from matter;
 id |  name  |      city      |            info
----+--------+----------------+----------------------------
  1 | Brian  | Salt Lake City | abcdefghijklmnopqrstuvwxyz
  2 | Robert | Paris          | some info
  3 | James  | Bountiful      | important stuff
(3 rows)

Now let’s see what happens when Robert moves from Paris to Chicago.

UPDATE matter SET city = 'Chicago', info = 'moved cities' WHERE id = 2; and CHECKPOINT;

00000000  00 00 00 00 88 a1 b2 01  00 00 00 00 28 00 00 1f  |............(...|
00000010  00 20 04 20 67 10 00 00  b0 9f 98 00 78 9f 66 00  |. . g.......x.f.|
00000020  38 9f 78 00 00 9f 70 00  00 00 00 00 00 00 00 00  |8.x...p.........|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001f00  67 10 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |g...............|
00001f10  04 00 04 80 02 28 18 00  02 00 00 00 0f 52 6f 62  |.....(.......Rob|
00001f20  65 72 74 11 43 68 69 63  61 67 6f 1b 6d 6f 76 65  |ert.Chicago.move|
00001f30  64 20 63 69 74 69 65 73  66 10 00 00 00 00 00 00  |d citiesf.......|
00001f40  00 00 00 00 00 00 00 00  03 00 04 00 02 09 18 00  |................|
00001f50  03 00 00 00 0d 4a 61 6d  65 73 15 42 6f 75 6e 74  |.....James.Bount|
00001f60  69 66 75 6c 21 69 6d 70  6f 72 74 61 6e 74 20 73  |iful!important s|
00001f70  74 75 66 66 00 00 00 00  65 10 00 00 67 10 00 00  |tuff....e...g...|
00001f80  00 00 00 00 00 00 00 00  04 00 04 40 02 01 18 00  |...........@....|
00001f90  02 00 00 00 0f 52 6f 62  65 72 74 0d 50 61 72 69  |.....Robert.Pari|
00001fa0  73 15 73 6f 6d 65 20 69  6e 66 6f 00 00 00 00 00  |s.some info.....|
00001fb0  64 10 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |d...............|
00001fc0  01 00 04 00 02 09 18 00  01 00 00 00 0d 42 72 69  |.............Bri|
00001fd0  61 6e 1f 53 61 6c 74 20  4c 61 6b 65 20 43 69 74  |an.Salt Lake Cit|
00001fe0  79 37 61 62 63 64 65 66  67 68 69 6a 6b 6c 6d 6e  |y7abcdefghijklmn|
00001ff0  6f 70 71 72 73 74 75 76  77 78 79 7a 00 00 00 00  |opqrstuvwxyz....|
00002000

Robert is in there twice! If we run the select again, we only see the new changes albeit out of order according to their id values.

millis=# select * from matter;
 id |  name  |      city      |            info
----+--------+----------------+----------------------------
  1 | Brian  | Salt Lake City | abcdefghijklmnopqrstuvwxyz
  3 | James  | Bountiful      | important stuff
  2 | Robert | Chicago        | moved cities
(3 rows)

Every time an update occurs, the entire tuple is written again, and the existing tuple is marked as “dead”.1 This way, any transactions that were running prior to this change will return the older version. This aids in the atomicity, consistency, and isolation parts of being ACID compliant.

The reason the select statement returned rows out of id order is because this is the new order of the clean rows in the file.

Deletes

Now let’s delete a row and see what that looks like.

DELETE FROM matter where id = 1 and CHECKPOINT.

00000000  00 00 00 00 f8 a3 b2 01  00 00 00 00 28 00 00 1f  |............(...|
00000010  00 20 04 20 67 10 00 00  b0 9f 98 00 78 9f 66 00  |. . g.......x.f.|
00000020  38 9f 78 00 00 9f 70 00  00 00 00 00 00 00 00 00  |8.x...p.........|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001f00  67 10 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |g...............|
00001f10  04 00 04 80 02 29 18 00  02 00 00 00 0f 52 6f 62  |.....).......Rob|
00001f20  65 72 74 11 43 68 69 63  61 67 6f 1b 6d 6f 76 65  |ert.Chicago.move|
00001f30  64 20 63 69 74 69 65 73  66 10 00 00 00 00 00 00  |d citiesf.......|
00001f40  00 00 00 00 00 00 00 00  03 00 04 00 02 09 18 00  |................|
00001f50  03 00 00 00 0d 4a 61 6d  65 73 15 42 6f 75 6e 74  |.....James.Bount|
00001f60  69 66 75 6c 21 69 6d 70  6f 72 74 61 6e 74 20 73  |iful!important s|
00001f70  74 75 66 66 00 00 00 00  65 10 00 00 67 10 00 00  |tuff....e...g...|
00001f80  00 00 00 00 00 00 00 00  04 00 04 40 02 05 18 00  |...........@....|
00001f90  02 00 00 00 0f 52 6f 62  65 72 74 0d 50 61 72 69  |.....Robert.Pari|
00001fa0  73 15 73 6f 6d 65 20 69  6e 66 6f 00 00 00 00 00  |s.some info.....|
00001fb0  64 10 00 00 68 10 00 00  00 00 00 00 00 00 00 00  |d...h...........|
00001fc0  01 00 04 20 02 01 18 00  01 00 00 00 0d 42 72 69  |... .........Bri|
00001fd0  61 6e 1f 53 61 6c 74 20  4c 61 6b 65 20 43 69 74  |an.Salt Lake Cit|
00001fe0  79 37 61 62 63 64 65 66  67 68 69 6a 6b 6c 6d 6e  |y7abcdefghijklmn|
00001ff0  6f 70 71 72 73 74 75 76  77 78 79 7a 00 00 00 00  |opqrstuvwxyz....|
00002000

The record with the name “Brian” still exists, but there was a subtle change. This tuple is now marked as “dead” as well. Running the select again returns what we expect.

millis=# select * from matter;
 id |  name  |   city    |      info
----+--------+-----------+-----------------
  3 | James  | Bountiful | important stuff
  2 | Robert | Chicago   | moved cities
(2 rows)

On a table with lots of updates and deletes, the file could get really big really fast. In the next section, we’ll talk about how Postgres cleans up after itself.


  1. Note, the terms “dead” and “dirty” are interchangeable in this context.