Sql Server Generate Hash Key

The MD2, MD4, MD5, SHA, and SHA1 algorithms are deprecated starting with SQL Server 2016 (13.x). Use SHA2256 or SHA2512 instead. Older algorithms will continue working, but they will raise a deprecation event. Examples Return the hash of a variable. The following example returns the SHA2256 hash of the nvarchar data stored in variable @HashThis.

Stack Exchange network consists of 175 Q&A communities including Stack Overflow, the largest, most trusted online community for developers to learn, share. Nov 22, 2013  The Fowler–Noll–Vo hash function is my favorite, as it offers an acceptable rate of collisions (lower than CHECKSUM), but does not sacrifice computational performance of the hash result. There is also a handy SQL Server CLR for the FNV hash algorithm which I typically use throughout many databases that exhibit a hashing requirement. Dec 23, 2018 HASHBYTES - Hashing in MS SQL Server In short words, hashing is a process of generating a value or values from a string of text using a mathematical function. Let's see the usage of the MS SQL function HASHBYTES witch purpose is to hash values. Apr 09, 2016 Hash keys replace sequence numbers (generated by the database engine) of the Data Vault 1.0 standard. They support geographically distributed data warehouses, as well as integration with big data environments like Hadoop. A hash key is a hash value of the business key column(s) used in a Hub or Link. Find answers to Generate Hash Key in SQL Server 2005 using SHA 512 algorithm from the expert community at Experts Exchange.

By: K. Brian Kelley Updated: 2014-07-25 Comments (6) Related: More >Security

Problem

I am trying to store password hashes in SQL Server. I know I can generate those hashes using the HASHBYTES() function, but I don't see where it takes a salt. I've been told it's good to have a salt. Is there an easy way to do this?

Solution

Indeed there is. However, first, a caveat. If you can, you want to generate the hash in the application. If you don't, there is the potential for a DBA to be able to see the password using SQL Profiler, a server side trace, or through Extended Events. HASHBYTES() doesn't cause these mechanisms to hide the T-SQL that was passed, as can be seen here:

If you can't do this at the application layer, here's how to do it entirely within SQL Server.

What to Use as a Salt the SQL Server HASHBYTES() function

If you're not familiar with what the salt is when it comes to cryptographic functions, it's basically something added to whatever we're trying to encrypt to make it harder to decrypt the data (two way functions, like symmetric and asymmetric key functions) or find a collision (one way functions, AKA hash functions). The salt should be potentially different for every single piece of encrypted data. The salt should be randomly generated.

Since the salt should be randomly generated, this eliminates basic functions derived from date/time or anything of that sort. SQL Server does have a RAND() function, which does serve as random number generator. In addition, it can be seeded. However, it's a pseudo-random number generator. If you give it the same seed, it'll produce the same results. Therefore, we'll want our potential seed value range to be large.

We can use the time, specifically the hour, minute, second, and millisecond values to generate a reasonable large seed value pool. It is not perfectly random, but nothing ever is when it comes to these functions. Most of the random number generator functions work off of the computer clock and we're basically using that in order to generate the values for our salt. That leads to something like:

Note that I'm generating the seed value by shifting hour, minute, and second values over by powers of ten. Then I'm using the RAND() function to generate a text string of 25 characters. This will be our salt.

Putting It All Together

With the salt generated, it's a simple matter of concatenating the salt and the password, then submitting the combined string into HASHBYTES(). This results in a solution which will store both the salt and the salt+password hash:

As for verification, we'll need to basically repeat the same steps, except we'll retrieve the stored salt from the database.

Testing the Solution

We can test it both with a relatively normal sized password and with the longest password allowed.

If we get a zero on the return from the stored procedure, we have a match. With a value of 1, we don't. Therefore, if we just run the verification test all at once, we get:

Next Steps
  • Read up on the hashing algorithms presented by HASHBYTES() so you can choose the correct one.
  • Learn how to use authenticators for other forms of encryption within SQL Server.
  • Know how to restrict what the DBAs see with respect to data that needs to be encrypted.

Last Updated: 2014-07-25
Sql server hash join



About the author
K. Brian Kelley is a SQL Server author and columnist focusing primarily on SQL Server security.
View all my tips


17 February 2006

Your application may require an index based on a lengthy string, or even worse, a concatenation of two strings, or of a string and one or two integers. In a small table, you might not notice the impact. But suppose the table of interest contains 50 million rows? Then you will notice the impact both in terms of storage requirements and search performance.

Using Hash Keys instead of String Indexes

Your application may require an index based on a lengthy string, or even worse, a concatenation of two strings, or of a string and one or two integers. In a small table, you might not notice the impact. But suppose the table of interest contains 50 million rows? Then you will notice the impact both in terms of storage requirements and search performance.

You don’t have to do it this way. There is a very slick alternative, using what are known alternatively as hash buckets or hash keys.

What is a Hash?

In brief, a hash is the integer result of an algorithm (known as a hash function) applied to a given string. You feed said algorithm a string and you get back an integer. If you use an efficient hash function then there will be only a small chance that two different strings will yield the same hash value. If this does occur, then it is known as a hash collision. Suppose that you fed this article into a hash algorithm, then changed one character in the article and fed the article back into the hashing algorithm: it would return a different integer.

Hash Keys in Database Design

Hash In Sql

Now, how can we apply hash leys intelligently in our database designs? Suppose that we have these columns in the table of interest:

Column Name

Data Type

Name

Varchar(50)

GroupName

Varchar(50)

A compound index on both these columns would consume 50 + 50 characters per row. Given 50 million rows, this is a problem. A hash key based on these two columns is vastly smaller (4 bytes per row). Even better, we don’t have to store the hash keys themselves – or more accurately, we have to store them just once. We create a calculated column whose formula is the hash key of these two columns. Now, we index the hash key row and don’t bother with the index on the two columns mentioned above.

The basic process is as follows:

  1. The user (whether a human or an application) queries the values of interest
  2. These values are then converted into a hash key
  3. The database engine searches the index on the hashed column, returning the required row, or a small subset of matching rows.

In a 50 million row table, there will undoubtedly be hash collisions, but that isn’t the point. The set of rows returned will be dramatically smaller than the set of rows the engine would have to visit in order to find an exact match on the original query values. You isolate a small subset of rows using the hash key and then perform an exact-string match against the hits. A search based on an integer column can be dramatically faster than a search based on a lengthy string key, and more so if it is a compound key.

Hashing In Sql Server

Hash Key Algorithms using the Checksum Function

There are several algorithms available, the simplest of which is built into SQL Server in the form of the Checksum function. For example, the following query demonstrates how to obtain the hash key for any given value or combination of values:

USEAdventureWorksSELECTName,GroupName,Checksum(Name,GroupName)ASHashKeyFROMAdventureworks.HumanResources.DepartmentORDERBYHashKey

This results in the following rows (clipped to 10 for brevity):

Name

GroupName

Hashkey

Tool Design

Research and Development

-2142514043

Production

Manufacturing

-2110292704

Shipping and Receiving

Inventory Management

-1405505115

Purchasing

Inventory Management

-1264922199

Document Control

Quality Assurance

-922796840

Information Services

Executive General and Administration

-904518583

Quality Assurance

Quality Assurance

-846578145

Sales

Sales and Marketing

-493399545

Production Control

Manufacturing

-216183716

Marketing

Sales and Marketing

-150901473

You have a number of choices as to how you create the hash key. You might elect to fire an INSERT trigger, or use a stored procedure to create the hash key once the values of interest have been obtained, or even to execute an UPDATE query that creates the hash keys and populates the hash column retroactively (so that you can apply this technique to tables that already contain millions of rows). As stated above, my preferred solution is to “store” the hash keys in a calculated column that is then indexed. As such, the index contains the hash keys but the table itself does not.

Using this technique, you might approach the problem as follows, assuming that the front end passes in the target values for Name and GroupName:

CREATEPROCEDUREDemoHash(@NameVarchar(50),@GroupNameVarchar(50))AS-- USE AdventureWorksDECLARE @id as intSET @id = Checksum(@Name,@GroupName)SELECT * FROM Adventureworks.HumanResources.DepartmentWHERE HashKey = @idAND Name = @Name AND GroupName = @GroupName

Sql Server Generate Hash

Conclusion

Sql Server Create Hash Key

This approach can yield considerable performance benefits and I encourage you to test it out on your own systems. The technique, as presented here, assumes that the search targets exist in a single table, which may not always be the case. I am still experimenting with ways to use this technique to search joined tables, and when I come up with the best approach, I will let you know.