SQL-Server Temp Database Space Issue
Every SQL-Server professional knows that the Master database is the brain of SQL-Server and the Temp database is the spinal cord of SQL-Server. Once it stops, everything starts failing.
In mid-Aug-2021, We faced an interesting problem. The temp database was getting filled every week. It was failing EMR (Data Warehouse and Application) Jobs and Application. My team was under tremendous pressure to get it to fix. One thing I would like to mention is that there is a difference in size allocation and temp data size use.
Below are the points we had tried before the real fix: –
- We had monitored the Jobs, like which jobs were filling the database.
- We had Checked and Tweaked heavy queries with the help of Query-Store.
- Temp database captures the space on the server till it needs it and then Stops. So, we had increased the Temp database size.
- We had started to check the SQL server configuration. Everything was correct at the configuration level. Like 250 GB drive was allocated to Temp database only, having multiple logs file and auto increase property set to true.
The above changes gave us some relaxation. But it had not stopped completely. So after four days, We again got the “full temp database” issue.
Then we have checked all the databases configuration for the recovery model for all our databases was set to “FULL”. Because the client needed the “In Time Recovery”. So, we had thought this might be this can be causing the issue.
Although the back of the mind, we were clear that the recovery model is only related to a specific database transaction log and has no relation to the temp database. But with a valid reason, that reporting database and data hub can be regenerated again using Jobs and a day-old backup. So, we had set the Simple Recovery model to Datahub and Reporting Databases. But it had not fixed the issue.
Then we had checked all the open sleeping sessions, which were not allowing the temp database to release the space using the below query:
SELECT DES.session_id AS [SESSION ID],Db_name(DDSSU.database_id) AS [DATABASE Name],host_name AS [System Name],program_name AS [Program Name], login_name AS [USER Name],status,total_elapsed_time AS [Elapsed TIME (in milisec)], ( memory_usage * 8 ) AS [Memory USAGE (in KB)],CASE is_user_process WHEN 1 THEN ‘user session’ WHEN 0 THEN ‘system session’ END AS [SESSION Type], row_count AS [ROW COUNT] FROM tempdb.sys.dm_db_session_space_usage AS DDSSU INNER JOIN sys.dm_exec_sessions AS DES ON DDSSU.session_id = DES.session_id ORDER BY 7 DESC
These screenshots are not real, they are just for a reference purpose to show query output.
From query output, We were able to verify each session query. All queries were of short duration and mostly related to application sessions or user executed small queries. So, nothing was blocking the temp database from release space.
Query to Check session query
CROSS APPLY sys.dm_exec_sql_text(most_recent_sql_handle)
WHERE session_id = (53)
Now we had started checking what type of objects were taking space on temp database. using below query
SELECT SUM(unallocated_extent_page_count) AS [free pages], SUM(unallocated_extent_page_count+ user_object_reserved_page_count+ internal_object_reserved_page_count+ mixed_extent_page_count+ version_store_reserved_page_count) * (8.0/1024.0/1024.0) AS [Total TempDB SizeInGB], SUM(unallocated_extent_page_count * (8.0/1024.0/1024.0)) AS [Free TempDB SpaceInGB],unallocated_extent_page_count,user_object_reserved_page_count ,SUM(version_store_reserved_page_count * (8.0/1024.0/1024.0)) AS [version_store_GB],internal_object_reserved_page_count,mixed_extent_page_count FROM tempdb.sys.dm_db_file_space_usage group by unallocated_extent_page_count,user_object_reserved_page_count,internal_object_reserved_page_count,mixed_extent_page_count;
We found that two spaces were filled (Internal objects and version store GB). When the application had a huge load or jobs were running. After all the jobs were finished and the application load was settled down. The internal objects released their space back to the server. But row version store not.
Then we have checked the default Isolation Level of every database and found default Isolation was Snapshot Isolation. Now, this was a real problem. Because we were aware that EMR Jobs were moving large data here and there and keep the row version through snapshot isolation didn’t make sense. Also, Snapshot Isolation is good when you have a multi-user environment. However, on Datahub and Reporting databases most data is not changeable after the job complete. So, we have changed the Isolation level of reporting and datahub databases to “Read Committed”. This change was in transition mode for five hours. After that, we had shrunk the temp database.
Now all Jobs are running fine and the application databases are still “Full recovery” model for In Time Recovery. So, our changes were no code change and fixed the Temp Database Issue.
It is wonderful learning, so thought to share it with the community.
Chinmay Kant Jain