To add or drop a column or modify the list of compressed values of an existing table is a quite expensive operation. For a large table it might result in a huge amount of CPU and IO usage and a loooooooong runtime. This blog discusses the pros and cons of the different ways to do it.
Alter Table vs. Insert Select vs. Merge Into
As always in SQL one got multiple choices to reach the same goal: modify a table directly or move the data to a new table. The former is ALTER TABLE (Alter), the latter INSERT SELECT (InsSel) or its less well-known variation MERGE INTO (Merge).
Let's start with a list of pros and cons (red = negative, green = positive):
ALTER TABLE | INSERT SELECT | MERGE INTO | |
---|---|---|---|
Needs Transient Journal? | no | no | no |
ABORT possible? | no | yes (fast) | yes (fast) |
Rollback during system restart? | no | yes (fast) | yes (fast) |
LOCK on source table | exclusive | read | read |
Spoolspace used | no | yes, same as source | no |
Additional Permspace used | low, 2 cylinders per AMP | high, same as source | high, same as source |
Works on a table copy? | no | yes | yes |
Must Create/Drop/Rename Table? | no | yes | yes |
Must recreate Secondary/Hash/Join Indexes Foreign Keys/Statistics/Comments Access Rights? | no | yes | yes |
Supports changing Primary Index/Partitioning? | no | yes | yes |
You can easily spot that InsSel and Merge are quite similar, but Alter is usually different.
The only common ground is the Transient Journal, all three don't use it (of course there are some entries indicating there some work going on, but the actual rows are not journaled). Due to that fact InsSel and Merge can be easily aborted and will rollback quite fast (just deleting all rows in the target table), but once Alter started it must finish, there's no way to abort it. Even a system shutdown can't stop it, it will simply continue after the restart. Some will consider this as positive others as negative :-)
The most important difference is the availability during the restructure process: Both InsSel and Merge apply a read lock allowing concurrent read access while Alter needs an exclusive lock blocking any access to the target table. That's the main reason why Alter is not used in most environments. Additionally before TD13 there was a table level write lock on dbc.AccessRights which was held throughout the whole process easily blocking other sessions. Yet in current releases this lock duration has been greatly reduced, now other requests will only be blocked for a short period. Some additional RowHash locks on system tables usually don't interfere with other requests, but might block backups.
Both Alter and Merge don't use Spool, Alter moves block on a cylinder level and Merge directly merges the source rows into the target table. But InsSel always needs to spool the source data, of course this is especially bad for large tables when explain shows "The result spool file will not be cached in memory".
Keeping a copy of the original table is often regarded as an advantage of Merge and InsSel ("just in case"), but when you're constraint on permspace you might prefer Alter's low overhead of a few megabytes per AMP.
However the biggest advantage of Alter is its simplicity, just submit "ALTER TABLE tab ADD new_col int, ADD existing_col COPRESS ('bla');", that's it.
Compare this to all the additional steps needed for InsSel or Merge. It's not only CREATE/DROP/RENAME, all those COMMENTs, GRANTs, COLLECT STATS must be scripted before and then reapplied, too. Maintaining Referential Integrity might be complicated when the table is referenced in a Foreign Key. And to speed up processing the target table will be created with the Primary Index only, any additional index must be recreated subsequently.
Resource usage and runtime
I'm not showing exact number because your mileage may vary, but for tables without secondary indexes the CPU/IO scoring is usually:
- Alter Table
- Merge Into
- Insert Select
In my test cases Merge needed almost twice the CPU and IO of an Alter and InsSel added another 20%.
When Secondary/Hash/Join indexes exist InsSel gets closer to Merge but the gap to Alter increases drastically: Alter still needs to modify only the base rows instead of re-building all the indexes.
Runtime differences should be similar to CPU/IO, but they will vary greatly amongst systems due to different bottlenecks and you should run some tests on your own system.
Conclusion
I would strongly recommend implementing Alter Table, at least start considering it. If you're concerned about availability you should bear in mind that this process will probably be scheduled out of business hours anyway.
And when you need to change the [P]PI or you just want the safeness of a copy of the old table you should definitely prefer Merge Into over good ol'Insert Select.