View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001590 | SymmetricDS | Bug | public | 2014-02-18 06:58 | 2024-08-21 18:47 |
Reporter | azarubkin | Assigned To | |||
Priority | normal | ||||
Status | closed | Resolution | fixed | ||
Product Version | 3.5.13 | ||||
Target Version | 3.5.22 | Fixed in Version | 3.5.22 | ||
Summary | 0001590: Router job deadlocked | ||||
Description | There are deadlocks happening from time to time. The log file says: {{{ 2014-02-18 09:05:45,406 ERROR [main] [RouterJob] [main-job-17] Transaction (Process ID 52) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. org.jumpmind.db.sql.SqlException: Transaction (Process ID 52) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. at org.jumpmind.db.sql.AbstractSqlTemplate.translate(AbstractSqlTemplate.java:288) at org.jumpmind.db.sql.AbstractSqlTemplate.translate(AbstractSqlTemplate.java:279) at org.jumpmind.db.sql.JdbcSqlTemplate.execute(JdbcSqlTemplate.java:417) at org.jumpmind.db.sql.JdbcSqlTemplate.update(JdbcSqlTemplate.java:285) at org.jumpmind.db.sql.AbstractSqlTemplate.update(AbstractSqlTemplate.java:212) at org.jumpmind.symmetric.service.impl.OutgoingBatchService.updateAbandonedRoutingBatches(OutgoingBatchService.java:119) at org.jumpmind.symmetric.service.impl.RouterService.routeData(RouterService.java:156) at org.jumpmind.symmetric.job.RouterJob.doJob(RouterJob.java:40) at org.jumpmind.symmetric.job.AbstractJob.invoke(AbstractJob.java:180) at org.jumpmind.symmetric.job.AbstractJob.run(AbstractJob.java:224) at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:53) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.runAndReset(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.sql.SQLException: Transaction (Process ID 52) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. at net.sourceforge.jtds.jdbc.SQLDiagnostic.addDiagnostic(SQLDiagnostic.java:368) at net.sourceforge.jtds.jdbc.TdsCore.tdsErrorToken(TdsCore.java:2820) at net.sourceforge.jtds.jdbc.TdsCore.nextToken(TdsCore.java:2258) at net.sourceforge.jtds.jdbc.TdsCore.getMoreResults(TdsCore.java:632) at net.sourceforge.jtds.jdbc.JtdsStatement.processResults(JtdsStatement.java:584) at net.sourceforge.jtds.jdbc.JtdsStatement.executeSQL(JtdsStatement.java:546) at net.sourceforge.jtds.jdbc.JtdsPreparedStatement.execute(JtdsPreparedStatement.java:560) at org.apache.commons.dbcp.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:172) at org.apache.commons.dbcp.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:172) at org.jumpmind.db.sql.JdbcSqlTemplate$5.execute(JdbcSqlTemplate.java:308) at org.jumpmind.db.sql.JdbcSqlTemplate$5.execute(JdbcSqlTemplate.java:285) at org.jumpmind.db.sql.JdbcSqlTemplate.execute(JdbcSqlTemplate.java:415) ... 15 more }}} And this is the information from Microsoft SQL Server log: {{{ 2014-02-18 09:05:45.39 spid18s deadlock-list 2014-02-18 09:05:45.39 spid18s deadlock victim=process3ec52e8 2014-02-18 09:05:45.39 spid18s process-list 2014-02-18 09:05:45.39 spid18s process id=process3ec52e8 taskpriority=0 logused=0 waitresource=PAGE: 5:1:631780 waittime=828 ownerId=124466588 transactionname=UPDATE lasttranstarted=2014-02-18T09:05:44.443 XDES=0x1291d55d0 lockMode=U schedulerid=2 kpid=2136 status=suspended spid=52 sbid=0 ecid=0 priority=0 transcount=2 lastbatchstarted=2014-02-18T09:05:44.443 lastbatchcompleted=2014-02-18T09:05:44.443 clientapp=jTDS hostname=SERVER-MAIN-1 hostpid=123 loginname=symmetricds isolationlevel=read committed (2) xactid=124466588 currentdb=5 lockTimeout=4294967295 clientoption1=671088672 clientoption2=128056 2014-02-18 09:05:45.39 spid18s executionStack 2014-02-18 09:05:45.39 spid18s frame procname=adhoc line=1 stmtstart=78 sqlhandle=0x020000000755fa071b8c380923157f23928fec9bb98b3ec9 2014-02-18 09:05:45.39 spid18s update sym_outgoing_batch set status= @P0 where status = @P1 2014-02-18 09:05:45.39 spid18s inputbuf 2014-02-18 09:05:45.39 spid18s (@P0 nvarchar(4000),@P1 nvarchar(4000))update sym_outgoing_batch set status= @P0 where status = @P1 2014-02-18 09:05:45.39 spid18s process id=process5549438 taskpriority=0 logused=760 waitresource=PAGE: 5:1:631232 waittime=828 ownerId=124466602 transactionname=implicit_transaction lasttranstarted=2014-02-18T09:05:44.550 XDES=0x80034dd0 lockMode=IX schedulerid=8 kpid=556 status=suspended spid=54 sbid=0 ecid=0 priority=0 transcount=2 lastbatchstarted=2014-02-18T09:05:44.550 lastbatchcompleted=2014-02-18T09:05:44.550 clientapp=jTDS hostname=SERVER-MAIN-1 hostpid=123 loginname=symmetricds isolationlevel=read committed (2) xactid=124466602 currentdb=5 lockTimeout=4294967295 clientoption1=671088672 clientoption2=128058 2014-02-18 09:05:45.39 spid18s executionStack 2014-02-18 09:05:45.39 spid18s frame procname=adhoc line=1 stmtstart=832 sqlhandle=0x02000000ee7e1d2d3e775b0d293df106757e957f0d46bd2f 2014-02-18 09:05:45.39 spid18s update sym_outgoing_batch set status= @P0 , load_id= @P1 , extract_job_flag= @P2 , load_flag= @P3 , error_flag= @P4 , byte_count= @P5 , extract_count= @P6 , sent_count= @P7 , load_count= @P8 , data_event_count= @P9 , reload_event_count= @P10 , insert_event_count= @P11 , update_event_count= @P12 , delete_event_count= @P13 , other_event_count= @P14 , ignore_count= @P15 , router_millis= @P16 , network_millis= @P17 , filter_millis= @P18 , load_millis= @P19 , extract_millis= @P20 , sql_state= @P21 , sql_code= @P22 , sql_message= @P23 , failed_data_id= @P24 , last_update_hostname= @P25 , last_update_time= @P26 where batch_id= @P27 and node_id= @P28 2014-02-18 09:05:45.39 spid18s inputbuf 2014-02-18 09:05:45.39 spid18s (@P0 nvarchar(4000),@P1 bigint,@P2 decimal(38,0),@P3 decimal(38,0),@P4 decimal(38,0),@P5 bigint,@P6 bigint,@P7 bigint,@P8 bigint,@P9 bigint,@P10 bigint,@P11 bigint,@P12 bigint,@P13 bigint,@P14 bigint,@P15 bigint,@P16 bigint,@P17 bigint,@P18 bigint,@P19 bigint,@P20 bigint,@P21 nvarchar(4000),@P22 decimal(38,0),@P23 nvarchar(4000),@P24 bigint,@P25 nvarchar(4000),@P26 datetime,@P27 decimal(38,0),@P28 nvarchar(4000))update sym_outgoing_batch set status= @P0 , load_id= @P1 , extract_job_flag= @P2 , load_flag= @P3 , error_flag= @P4 , byte_count= @P5 , extract_count= @P6 , sent_count= @P7 , load_count= @P8 , data_event_count= @P9 , reload_event_count= @P10 , insert_event_count= @P11 , update_event_count= @P12 , delete_event_count= @P13 , other_event_count= @P14 , ignore_count= @P15 , router_millis= @P16 , network_millis= @P17 , filter_millis= @P18 , load_millis= @P19 , extract_millis= @P20 , sql_state= @P21 , sql_code= @P22 , sql_message= @P23 , failed_data_id= @P24 , last_update_hostname= @P25 , last_update_time= 2014-02-18 09:05:45.39 spid18s resource-list 2014-02-18 09:05:45.39 spid18s pagelock fileid=1 pageid=631780 dbid=5 objectname=guard.symmetricds.sym_outgoing_batch id=lock824c4280 mode=IX associatedObjectId=72057594055557120 2014-02-18 09:05:45.39 spid18s owner-list 2014-02-18 09:05:45.39 spid18s owner id=process5549438 mode=IX 2014-02-18 09:05:45.39 spid18s waiter-list 2014-02-18 09:05:45.39 spid18s waiter id=process3ec52e8 mode=U requestType=wait 2014-02-18 09:05:45.39 spid18s pagelock fileid=1 pageid=631232 dbid=5 objectname=guard.symmetricds.sym_outgoing_batch id=lock8c1c4f80 mode=U associatedObjectId=72057594055557120 2014-02-18 09:05:45.39 spid18s owner-list 2014-02-18 09:05:45.39 spid18s owner id=process3ec52e8 mode=U 2014-02-18 09:05:45.39 spid18s waiter-list 2014-02-18 09:05:45.39 spid18s waiter id=process5549438 mode=IX requestType=wait }}} Hope it helps you to understand what's going on. | ||||
Tags | dialect: sql-server, routing | ||||
|
How often does this happen? Do you ever do maintenance on your sym_ tables? I think SQL Server is escalating a lock to a page lock which causes the deadlock. You could try the following: ALTER INDEX ALL ON SYM_OUTGOING_BATCH SET (ALLOW_ROW_LOCKS = ON); ALTER TABLE SYM_OUTGOING_BATCH SET (LOCK_ESCALATION = DISABLE); |
|
It happens several times per day, usually. I haven't touched sym_ tables except initial configuration. Purge job runs every 5 days, however, as far as I remember default configuration. I think it's better to explicitly disable page locks on indices, since row locks are already enabled by default (I checked): ALTER INDEX ... WITH (ALLOW_PAGE_LOCKS = OFF, ALLOW_ROW_LOCKS = ON); And then the second line about disabling lock escalation... Should I try them one by one, or both at the same time? |
|
I'd try one by one. I'd also set purge to be more aggressive and periodically reorg the runtime sym_ tables. ALTER INDEX ALL ON SYM_INCOMING_BATCH REORGANIZE; ALTER INDEX ALL ON SYM_OUTGOING_BATCH REORGANIZE; ALTER INDEX ALL ON SYM_DATA_GAP REORGANIZE; ALTER INDEX ALL ON SYM_DATA_EVENT_BATCH REORGANIZE; ALTER INDEX ALL ON SYM_DATA REORGANIZE; |
|
Thank you, I'll try it. By the way, SQL Server never escalates row lock to page lock. It escalates them to table lock. So, page locks were there from the very beginning of transactions, it's not the escalation, it's the query plan. Moreover, if the escalation fails, the transaction continues and escalation is retried later. Here we see a deadlock on sym_idx_ob_node_status index, it's another situation. It emerges because SQL Server needs to update sym_idx_ob_node_status index when we update rows in table. But it needs to scan (and possibly lock) many pages in the index, because the rows where status = @P1 are scattered. I think if the index was created as (status; node_id), i.e. the order of columns was reversed, SQL Server would need to scan lesser amount of rows (and maybe avoid page locks at all). Moreover, we wouldn't need another index (sym_idx_ob_status) then, because the first column from sym_idx_ob_node_status could be used instead. Please consider this proposal. I think it's better than disabling page locks because it places less burden on server. |
|
Of course this will work if there are no queries like "SELECT * FROM SYM_OUTGOING_BATCH WHERE NODE_ID = @P0". In that case current index on (node_id; status) should be preserved or another index on (node_id) only should be created. I think I can assist you in checking different index configurations, if you want. |
|
Any help in optimization is very much appreciated. If you want to do some analysis yourself, you could actually modify the symmetric-schema.xml in symmetric-core-X.XX.jar to change the indexes. Jar files can be edited by most archive tools. I think you could also extract symmetric-schema.xml and put it in the patches directory (versus updating the jar file). Editing the schema xml should be pretty straightforward. We'd be happy to review any possibly apply to the next release improvements in the indexes. |
Date Modified | Username | Field | Change |
---|---|---|---|
2014-02-18 06:58 | azarubkin | New Issue | |
2014-02-18 12:21 | chenson | Note Added: 0000481 | |
2014-02-18 13:17 | azarubkin | Note Added: 0000482 | |
2014-02-18 13:26 | chenson | Note Added: 0000483 | |
2014-02-18 14:09 | azarubkin | Note Added: 0000484 | |
2014-02-18 14:11 | azarubkin | Note Added: 0000485 | |
2014-02-18 14:12 | azarubkin | Note Edited: 0000485 | View Revisions |
2014-02-19 01:36 | chenson | Note Added: 0000487 | |
2019-04-22 16:47 | elong | Tag Attached: routing | |
2019-04-22 16:47 | elong | Tag Attached: mssql | |
2019-04-23 20:49 | admin | Tag Renamed | mssql => dialect: mssql |
2019-04-24 12:50 | admin | Tag Renamed | dialect: mssql => dialect: sql-server |
2020-05-21 16:55 | elong | Relationship added | related to 0001708 |
2020-05-21 16:56 | elong | Status | new => resolved |
2020-05-21 16:56 | elong | Resolution | open => fixed |
2020-05-21 16:56 | elong | Fixed in Version | => 3.5.22 |
2024-08-21 17:13 | cquamme | Assigned To | => cquamme |
2024-08-21 17:13 | cquamme | Status | resolved => closed |
2024-08-21 18:47 | cquamme | Assigned To | cquamme => |