Skip to main content

Digging into a long running job

Revisiting the blogger to finally start again blogging about the SQL Server after a long long time. 

This post is on analyzing and digging into what is causing a job significant time to complete. This will be a basic level of approach and I wish, I could have done better. But, its was enough in my case to get to a conclusion, with the knowledge I equip.  

Issue: A job that's scheduled to run every three hours, was running for more than 48hours. Average job run time, 1.5hrs. 

Analysis: First and foremost thing to do, is to find out on what session id is the job running on the instance. You may use the below query to fetch the session id of the job from the job name.

declare @Var1 varchar(36)
select @Var1=substring(REPLACE(CAST(job_id AS VARCHAR(36)),'-',''),17,16) from sysjobs where name ='JOB_NAME'
select * from sys.sysprocesses where program_name like '%'+@Var1+'%'

You may get a single row or multiple rows with the same spid number, which says, the query is running on parallelism with multiple subthreads. This basically depends on the MAX Degree of Parallelism that has been configured on the instance, which is 0 by default. 

After finding the job's session id, you will have to check the session status, which will be one of the following:

Process ID status:
dormant = SQL Server is resetting the session.
running = The session is running one or more batches. When Multiple Active Result Sets (MARS) is enabled, a session can run multiple batches. 
background = The session is running a background task, such as deadlock detection.
rollback = The session has a transaction rollback in process.
pending = The session is waiting for a worker thread to become available.
runnable = The task in the session is in the runnable queue of a scheduler while waiting to get a time quantum.
spinloop = The task in the session is waiting for a spinlock to become free.
suspended = The session is waiting for an event, such as I/O, to complete.

If the job is running a stored procedure, and if you want to know what statement of the stored procedure is currently running, with the spid in the below query, you can get the sql text.

select substring(st.text, (r.statement_start_offset/2)+1,
((case when r.statement_end_offset <=0 then datalength(st.text)
else r.statement_end_offset end-r.statement_start_offset)/2)+1) as Statement_Text
from sys.dm_exec_requests r
cross apply sys.dm_exec_sql_text(sql_handle) st
where session_id = <SPID>

Once all these details are present, you might want to check waittype column in the sysprocesses view. Below link is a repository of wait types, from which you can get much more insights.

http://blogs.msdn.com/b/psssql/archive/2009/11/03/the-sql-server-wait-type-repository.aspx

Also, you can check the query plan of the executing SQL Statement from the below query

SELECT * FROM sys.dm_exec_text_query_plan(plan_handle, statement_start_offset, statement_end_offset);
GO

The Parameters can be obtained from sys.dm_exec_requests 

select * from sys.dm_exec_requests where session_id = <spid>

The result of sys.dm_exec_text_query_plan executed above would give a query plan in xml format, which can be opened in SSMS to show a graphical execution plan for further analysis. You can save the output of the statement to 
.sqlplan file onto disk for any sort of file management.

If you find that a SPID is running on parallelism, and a single SPID has several rows returned in the output of sysprocesses, with a different ecid(Column that states subthread ID), then find what all wait types each ecid row is in. Usually, if you happen to see CXPACKET waittype(Status:Suspended), which is related to parallelism, it means that, some threads have completed and are waiting for the other threads(Status:Runnable/Other) to complete their part of execution. In this case, its important to find the waittype of the other running sub threads(ECID).
 
To check if the spid is consuming CPU and making some progress, then the below query would help find out that.

select * into #TEMP from sys.sysprocesses
GO
WAITFOR DELAY '00:00:10';
select T1.CPU-T2.CPU as CPU_Diff, T1.Physical_IO-T2.Physical_IO as IO_Diff, T1.* from sysprocesses T1, #TEMP T2 
where T1.SPID=T2.SPID and T1.Login_Time=T2.Login_Time and T1.ECID=T2.ECID Order by CPU_Diff DESC
GO
DROP TABLE #TEMP
GO



Comments

Popular posts from this blog

Deferred prepare could not be completed

This is usually an error that is encountered while querying a remote database on a different server instance which says OLEDB provider "SQLNCLI" for linked server "ABCD" returned message "Deffered prepare could not be completed" Msg 8180, Level 16, State 1, Line 1 Statement(s) could not be prepared. This is the way I could rectify, though this is not the lone mistake that gives this error every time. Just set the default to the database that is being querying to the user through which the Linked server is connecting to the database.

Script level upgrade for database 'master' failed because upgrade step 'sqlagent100_msdb_upgrade.sql'

As huge a database grows, that tedious are regular DBA activities become. Besides, a warehouse database, can grow with no bounds to challenge.  There was a patching on one of our databases, SP3 for SQL Server 2008 R2, which was failing due to the error below. As the error says, MASTER the heart of SQL Server  “Script level upgrade for database 'master' failed because upgrade step 'sqlagent100_msdb_upgrade.sql' encountered error 2627, state 1, severity 14. This is a serious error condition which might interfere with regular operation and the database will be taken offline. If the error happened during upgrade of the 'master' database, it will prevent the entire SQL Server instance from starting. Examine the previous errorlog entries for errors, take the appropriate corrective actions and re-start the database so that the script upgrade steps run to completion. Cannot recover the master database. SQL Server is unable to run. Restore master from a full

Key not valid for use in specified state - SSIS

You might see the following error while a package was newly imported and is set to be called from an SQL Agent through a job. The job would fail reporting the below error. Source:        Description: Failed to decrypt protected XML node "DTS:Property" with error 0x8009000B "Key not valid for use in specified state.". You may not be authorized to access this information. This error occurs when there is a cryptographic error. Verify that the correct key is available. End Error This is mainly because, the package protection level, which is by default set to  "EncryptWithSensitiveKey" Resolution: Set the package protection level to "DontSaveSensitive". But, if in case you know the password for the key, you could use it. How to change the SSIS Package Protection: In Business Intelligence Development Studio, open the Integration Services project that contains the package. Open the package in the SSIS designer. If the Properties window d