Skip to main content

Digging into a long running job

Revisiting the blogger to finally start again blogging about the SQL Server after a long long time. 

This post is on analyzing and digging into what is causing a job significant time to complete. This will be a basic level of approach and I wish, I could have done better. But, its was enough in my case to get to a conclusion, with the knowledge I equip.  

Issue: A job that's scheduled to run every three hours, was running for more than 48hours. Average job run time, 1.5hrs. 

Analysis: First and foremost thing to do, is to find out on what session id is the job running on the instance. You may use the below query to fetch the session id of the job from the job name.

declare @Var1 varchar(36)
select @Var1=substring(REPLACE(CAST(job_id AS VARCHAR(36)),'-',''),17,16) from sysjobs where name ='JOB_NAME'
select * from sys.sysprocesses where program_name like '%'+@Var1+'%'

You may get a single row or multiple rows with the same spid number, which says, the query is running on parallelism with multiple subthreads. This basically depends on the MAX Degree of Parallelism that has been configured on the instance, which is 0 by default. 

After finding the job's session id, you will have to check the session status, which will be one of the following:

Process ID status:
dormant = SQL Server is resetting the session.
running = The session is running one or more batches. When Multiple Active Result Sets (MARS) is enabled, a session can run multiple batches. 
background = The session is running a background task, such as deadlock detection.
rollback = The session has a transaction rollback in process.
pending = The session is waiting for a worker thread to become available.
runnable = The task in the session is in the runnable queue of a scheduler while waiting to get a time quantum.
spinloop = The task in the session is waiting for a spinlock to become free.
suspended = The session is waiting for an event, such as I/O, to complete.

If the job is running a stored procedure, and if you want to know what statement of the stored procedure is currently running, with the spid in the below query, you can get the sql text.

select substring(st.text, (r.statement_start_offset/2)+1,
((case when r.statement_end_offset <=0 then datalength(st.text)
else r.statement_end_offset end-r.statement_start_offset)/2)+1) as Statement_Text
from sys.dm_exec_requests r
cross apply sys.dm_exec_sql_text(sql_handle) st
where session_id = <SPID>

Once all these details are present, you might want to check waittype column in the sysprocesses view. Below link is a repository of wait types, from which you can get much more insights.

http://blogs.msdn.com/b/psssql/archive/2009/11/03/the-sql-server-wait-type-repository.aspx

Also, you can check the query plan of the executing SQL Statement from the below query

SELECT * FROM sys.dm_exec_text_query_plan(plan_handle, statement_start_offset, statement_end_offset);
GO

The Parameters can be obtained from sys.dm_exec_requests 

select * from sys.dm_exec_requests where session_id = <spid>

The result of sys.dm_exec_text_query_plan executed above would give a query plan in xml format, which can be opened in SSMS to show a graphical execution plan for further analysis. You can save the output of the statement to 
.sqlplan file onto disk for any sort of file management.

If you find that a SPID is running on parallelism, and a single SPID has several rows returned in the output of sysprocesses, with a different ecid(Column that states subthread ID), then find what all wait types each ecid row is in. Usually, if you happen to see CXPACKET waittype(Status:Suspended), which is related to parallelism, it means that, some threads have completed and are waiting for the other threads(Status:Runnable/Other) to complete their part of execution. In this case, its important to find the waittype of the other running sub threads(ECID).
 
To check if the spid is consuming CPU and making some progress, then the below query would help find out that.

select * into #TEMP from sys.sysprocesses
GO
WAITFOR DELAY '00:00:10';
select T1.CPU-T2.CPU as CPU_Diff, T1.Physical_IO-T2.Physical_IO as IO_Diff, T1.* from sysprocesses T1, #TEMP T2 
where T1.SPID=T2.SPID and T1.Login_Time=T2.Login_Time and T1.ECID=T2.ECID Order by CPU_Diff DESC
GO
DROP TABLE #TEMP
GO



Comments

Popular posts from this blog

Deferred prepare could not be completed

This is usually an error that is encountered while querying a remote database on a different server instance which says OLEDB provider "SQLNCLI" for linked server "ABCD" returned message "Deffered prepare could not be completed" Msg 8180, Level 16, State 1, Line 1 Statement(s) could not be prepared. This is the way I could rectify, though this is not the lone mistake that gives this error every time. Just set the default to the database that is being querying to the user through which the Linked server is connecting to the database.

Incorrect SET Options with Filtered Indexes

Rebuild Index|Create Index|DML on a Table with Filtered Index fails with the error: The following SET options have incorrect settings: 'QUOTED_IDENTIFIER'. Verify that SET options are correct for use with indexed views and/or indexes on computed columns and/or filtered indexes and/or query notifications and/or XML data type methods and/or spatial index operations. [SQLSTATE 42000] (Error 1934).  The step failed. The Operation above might succeed when its through SSMS. But, would fail when the operation is from a SQL Agent Job. This is because, SQL Agent by default, do not have the SET Options enabled (ON). They are OFF. Reason: The following are the SET Options that are needed while operation on a table with a filtered index. Required SET Options for Filtered Indexes The SET options in the Required Value column are required whenever any of the following conditions occur: Create a filtered index. INSERT, UPDATE, DELETE, or MERGE operation modifies the data in a...