Calculation of Table Hash Values to Compare Table Content

Short teaser:

To be able to compare table content between different systems a table hash function is needed.

Cover Image:

Attachment	Size
2289_Arndt.pdf	959.88 KB

At the Partners 2012 I presented the session “How to compare tables between TD systems in a dual active environment” - see attached slides for details if of interest for you.

The main emphasis was and is to draw attention to an issue which becomes more important as more and more customers are using multiple Teradata instances. Either as dual active or by using different appliances for different purposes (e.g. online archive plus EDW instance). If you expect to have the same data on two or more systems, the question can be asked on how to PROOF that the data is really the same – joining is not possible.

Current approaches like calculating column measures have major limitations especially for big tables – see the presentation for details. Unfortunately classical hash algorithm like SHA1 or MD5 – which had been designed especially for that purpose - can not be used within the DB as native functions on table level as they require the same ordering of data which can not be guaranteed at two different systems. An option is to extract the whole data sorted and run the SHA1 on the output file. But this leads to major costs for the sorts and network traffic.

A new way to overcome these limitations is a new table hash function which overcomes the ordering requirement of the SHA1 and MD5 hash functions.
The main idea is based on a two step approach:
1Calculate for each row a hash value where the outcome has the classical hash properties.
2“Aggregate” the row hash values by an aggregate function where the result does not depend on the ordering of the input (commutative and associative).

The resulting output can be used as a table hash to compare data content between different systems.

The main considerations to choose the right component function are:
for 1. avoid of hash collision, which require reasonable long hash values for the row hash. This is for example an issue if you wane use the internal hashrow function of Teradata as hash collisions occur even for small tables.
for 2. use a good function which is also handling multiset tables (where XOR) has an issue.
We implemented a C UDF and compared the results with export of data options and see already with this implementation competitive resource consumption figures.

During the last months we have spent more R&D on this and have found a better hash function for the row hash calculation. This reduces the CPU consumption to about 60% in comparison to the SHA1 hash function we used before. In addition we found a better aggregation function which overcomes limitations of the ADD MOD 2^Hash length function.

As discussed already in the Partner 2012 presentation the best performance for this kind of functions should be achieved in case the function would be implemented as a core DB function similar to the hashrow function. This would also improve usability as the described data preparation is not needed. But it is likely that Teradata will only implement this with a strong demand from the customer side.

In the mean time – and here starts the sales pitch – if you need to proof that data on two different systems are the same or you wane discuss any of this in more detail then contact me.

There are also additional use cases for this function which you should also take into consideration e.g. regression testing of DB versions, systems and your own software.
So in summary - be aware of the issues and consider the alternatives!

Ignore ancestor settings:

Tags:

table hash compare data dual system

Apply supersede status to children:

Calculation of Table Hash Values to Compare Table Content

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112