About the Author
Wang Rui exercise , good health database schema Ping Kong, operation and maintenance for many years postgresql database development work. Worked Civil Aviation Information, Decathlon China. There are also some other database products covered.
background
I recently found a lot of friends often encountered bad blocks or PostgreSQL data confusion, the online Chinese data is relatively small, so order a bit I encountered an error and a variety of solutions
Case I: Physical bad block
Being given logical backup
pg_dump: Dumping the contents of table "xxxx" failed: PQgetResult() failed.
pg_dump: Error message from server: ERROR: invalid memory alloc request size 18446744073709551613
pg_dump: The command was: COPY xxxxxx (id, active_flag, bkd, blk, go_show, grs, lss, lsv, lt, no_show, value, wl, inv_seg_cabin_id, ind) TO stdout;
pg_dump: [parallel archiver] a worker process died unexpectedly
The reason: a bad line database (hardware may be damaged and may be a bug (piece of before the memory gets overwritten by random data pg9.2 version), there may be incorrect hardware configuration)
First of all I consider the pg own parameters zero_damaged_pages, this parameter to true, but still found the error, looked under the official documents, this method does not make physical changes to the file, but on the memory, the cache becomes corrupted page 0. If this method resolved the error, please resume this table back out, or to select another table.
Solution: remove the damaged line
create extension hstore;(过程省略)
1, defined functions :
CREATE OR REPLACE FUNCTION
find_bad_row(tableName TEXT)
RETURNS tid
as $find_bad_row$
DECLARE
result tid;
curs REFCURSOR;
row1 RECORD;
row2 RECORD;
tabName TEXT;
count BIGINT := 0;
BEGIN
SELECT reverse(split_part(reverse($1), '.', 1)) INTO tabName;
OPEN curs FOR EXECUTE 'SELECT ctid FROM ' || tableName;
count := 1;
FETCH curs INTO row1;
WHILE row1.ctid IS NOT NULL LOOP
result = row1.ctid;
count := count + 1;
FETCH curs INTO row1;
EXECUTE 'SELECT (each(hstore(' || tabName || '))).* FROM '
|| tableName || ' WHERE ctid = $1' INTO row2
USING row1.ctid;
IF count % 100000 = 0 THEN
RAISE NOTICE 'rows processed: %', count;
END IF;
END LOOP;
CLOSE curs;
RETURN row1.ctid;
EXCEPTION
WHEN OTHERS THEN
RAISE NOTICE 'LAST CTID: %', result;
RAISE NOTICE '%: %', SQLSTATE, SQLERRM;
RETURN result;
END
$find_bad_row$
LANGUAGE plpgsql;
2, find the problem through the function line :
js1=# select find_bad_row('public.description');
NOTICE: LAST CTID: (78497,6)
NOTICE: XX000: invalid memory alloc request size 18446744073709551613
find_bad_row
--------------
(78497,6)
(1 row)
js1=# select * from xxxxxxx where ctid = '(78498,1)';
ERROR: invalid memory alloc request size 18446744073709551613
js1=# delete from xxxxxx where ctid = '(78498,1)';
Need to be processed in the form xxxx us here
3, and then execute the command pg_dump
Detailed analysis shows: https://www.postgresql.org/message-id/54889986.3000308%40gmail.com
Case II: pgclog file corruption due to power outages
pg_clog damage
Error message:Could not read from file ""pg_clog/0646"" at offset 243287
Abnormal power down the server, this is because the test libraries, so no backup and library equipment (dba so for it is life ah backup, whether it is a test or production database library must make a backup)
- The database library full physical backup (to do after the operation Insurance)
- The forgery data block (data block Commit all forgery), and change permission with dd
for i in {1..262144}; do printf '\125'; done > committed
ls -l committed
od -xv committed | head
od -xv committed | tail
$ ls -l committed
-rw-r--r-- 1 root root 262144 2009-06-25 11:01 committed
$ od -xv committed | head
0000000 5555 5555 5555 5555 5555 5555 5555 5555
0000020 5555 5555 5555 5555 5555 5555 5555 5555
0000040 5555 5555 5555 5555 5555 5555 5555 5555
0000060 5555 5555 5555 5555 5555 5555 5555 5555
0000100 5555 5555 5555 5555 5555 5555 5555 5555
0000120 5555 5555 5555 5555 5555 5555 5555 5555
0000140 5555 5555 5555 5555 5555 5555 5555 5555
0000160 5555 5555 5555 5555 5555 5555 5555 5555
0000200 5555 5555 5555 5555 5555 5555 5555 5555
0000220 5555 5555 5555 5555 5555 5555 5555 5555
$ od -xv committed | tail
0777560 5555 5555 5555 5555 5555 5555 5555 5555
0777600 5555 5555 5555 5555 5555 5555 5555 5555
0777620 5555 5555 5555 5555 5555 5555 5555 5555
0777640 5555 5555 5555 5555 5555 5555 5555 5555
0777660 5555 5555 5555 5555 5555 5555 5555 5555
0777700 5555 5555 5555 5555 5555 5555 5555 5555
0777720 5555 5555 5555 5555 5555 5555 5555 5555
0777740 5555 5555 5555 5555 5555 5555 5555 5555
0777760 5555 5555 5555 5555 5555 5555 5555 5555
1000000
chown postgres.postgres committed
chmod 600 committed
mv -i committed $PGDATA/pg_clog/0646
Note that this can only solve this problem, can not repair the damage to the underlying file, so if you have a backup or backup and restore better.
Case III: toast table damage
missing chunk number x for toast value x in pg_toast_x
Associated with a particular table toast table data corruption
Solution cited: http://m.2cto.com/database/201802/720718.html
1, positioning is toast which tables in question:
select 2619::regclass;
regclass
--------------
pg_statistic
2, find the table after which there is a problem, first do some simple fix to the table :
REINDEX table pg_toast.pg_toast_2619;
REINDEX table pg_statistic;
VACUUM ANALYZE pg_statistic;
3, positioning of the corrupted data table row . carried out
DO $$
declare
v_rec record;
BEGIN
for v_rec in SELECT * FROM pg_statistic loop
raise notice ‘Parameter is:‘, v_rec.ctid;
raise notice ‘Parameter is:’, v_rec;
end loop;
END;
$$
LANGUAGE plpgsql;
4, step 3 will locate the records are deleted :
delete from pg_statistic where ctid ='(50,3)';
5. Repeat steps 3 and 4 until all records are cleared in question.
6. At this point, toast the problem is solved over, after resolved, the database maintenance or a full index rebuild.
In fact, generally speaking, the database will not go voluntarily submitted pursuant to archive or wal postgres transaction rollback operation, I was in this environment because of the lack archiving, data can only be deleted manually confusion.
Finally, I want to say that, in many cases because there is no reliable backup and lead to many problems, it is recommended that everyone, no matter what the situation, a backup first, check the backup is very important!
forward from:
http://blog.sina.com.cn/s/blog_67d069a90102vibc.html