Following up on this article about querying JSON Data I would like to talk about how to improve searches on JSON data inside SQL Server. Starting SQL Server 2016 Microsoft deployed a set of functions that allow us to work with JSON data in a structured way inside SQL Server.

I will introduce a small usage sample for the JSON_VALUE function in combination with indexes to improve information retrieval from a table containing one JSON object. For our testes, we have a UserDetailTest table that has more than 500k rows with 2 columns: an UserId and a nvarchar(max) to hold a small JSON details string like the following:

SELECT *, LEN(DetailsJSON) AS [Len(DetailsJSON)] FROM UserDetailTest

I will activate time statistics and clean SQL Server cache between each query to have some consistency across the execution times using the following SQL statements:

CHECKPOINT
DBCC DROPCLEANBUFFERS
SET STATISTICS TIME ON

JSON_VALUE Function intro

The JSON_VALUE function extracts a value from a JSON string. This functions receives 2 arguments, the first being an expression for the JSON value and the second a path for the value we want to obtain. A simple sample with an inline JSON object would be the following:

SELECT JSON_VALUE('{"PostalCode":"376-3765","PhoneNumber":"351003765718"}', '$.PhoneNumber')

The return is a scalar value (nvarchar(4000)) for the PhoneNumber element.

Query scenarios

Starting with one of the really worst case scenario (yes, this can happen…):

More than 17 seconds really seems like a worst scenario! Next, we will use the JSON_VALUE function to get the PhoneNumber from the JSON string and use it in our where clause:

This still takes more than 3 seconds. A good improvement, but yet a high cost if we need to search information in this way.

Index creation

Let’s add a new virtual column to table that displays the result from the JSON_VALUE function – this will allow us to create an index and simplify the SELECT queries.

ALTER TABLE UserDetailTest 
	ADD vPhoneNumber AS 
	CAST(JSON_VALUE(DetailsJSON, '$.PhoneNumber') AS NVARCHAR(255));

The cast truncates the output from the JSON_VALUE to ensure that it does not exceed the maximum lenght for the index key. Now, if we search using the column, the result is still more or less the same 3 seconds has before:

Let’s create an index and perform the search again:

CREATE INDEX idx_vPhoneNumber ON [UserDetailTest] (vPhoneNumber);

Consequently the improvement is huge, 13 miliseconds is more interesting! But what’s the cost?

The cost

When we create an index, we’re basically trading space for time – more occupied space versus faster operations. Therefore let’s see what the increase is using sp_spaceused function before and after index creation:

rowsreserveddataindex_sizeunused
Initial557801 1120904 KB1115584 KB5224 KB96 KB
After Index5578011148752 KB1115584 KB32992 KB176 KB

Probably, an interesting comparison would be if the PhoneNumber column was an explicit column in that table containing the value. Let’s do that!

ALTER TABLE UserDetailTest  
	ADD PhoneNumberCopy NVARCHAR(255)

UPDATE UserDetailTest
	SET PhoneNumberCopy = vPhoneNumber

The sp_spaceused remained the same – SQL Server internal black magic regarding space allocation (for another time!). But performance wise, for this approach in my machine this query still took more than 2 seconds to complete:

Slightly better than the query with the JSON function but the advantage is that in this scenario SQL can better optimize the searches. The following runs of the same query without clearing the cache returns results much faster – each took around 220 miliseconds to complete:

Summary Results

Just to sum up all the results for the different types of searches comparing the first execution after cache clean up and the following runs. I did several iterations for each step just to make sure the results were consistent although the goal was just to have a baseline. Certanly, the use of this will depend on each case.

WHERE clausule typeFirst runFollowing runs
WHERE DetailsJson LIKE '%PhoneNumber":"351003765718"%'
17754 msaround the same
WHERE JSON_VALUE(DetailsJson, '$.PhoneNumber') = '351003765718'
3375 ms1958 ms
WHERE vPhoneNumber = '351003765718'
vPhoneNumber is not indexed 
3339 ms1332 ms
WHERE vPhoneNumber = '351003765718'
vPhoneNumber is indexed
13 ms0 ms
WHERE PhoneNumberCopy  = '351003765718'
PhoneNumberCopy  is of type nvarchar(255) and explicitly contains all values
2309 ms222 ms

6 COMMENTS

  1. Hallo Goncalo!
    Very interesting article!
    But what i miss is the comparison to the explicit column [PhoneNUmberCopy] with an index for the column.
    How ist the performance and the cost of the query in that Scenario?

    Regards
    Hans-Jörg

    • Hello Hans-Jörg Mayr,

      Thank you for your comment and question.

      Performance wise, for that scenario, considering the number of rows being around 500k and only a small percentage of values in that column are equal (more or less 10%). With an index we will have first queries around 2 ms and subsequent ones around 0ms. And creating this new index we had an increase in the index_size for this table of around 29000 KB.

      King regards
      Gonçalo Melo

  2. Hi Goncalo,

    Thanks for sharing this out.
    This is interesting that I can speed up by adding index to the table.

    Thumbs up for this!

    • Hi ev.argel

      Thank you.
      I’m glad you liked it.

      King regards,
      Gonçalo Melo

    • Hi Ice de Ocampo,

      Thank you,
      I’m glad you found this useful!

      Kind Regards,
      Gonçalo Melo

LEAVE A REPLY

Please enter your comment!
Please enter your name here