Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 10 of 23 total
Thread Intermittent Locking Issue
Sat, Jun 3 2017 2:51 PMPermanent Link

Kevin Koehne

We are having an intermittent issue at a large customer (15 gb database, 100+ users) which generally occurs during their high transaction times. The system will be running fine, when suddenly all of our processes start experiencing a locked table; and then are unable to continue working until we restart the Elevate service.

My current theory is that a user is executing a report/query that is long running (for example they are looking at a year's worth of data in a complex report); and the tables used in the query are then locked from being updated.

I've only got circumstantial evidence that this may be the issue. I log our report execution times, and when this issue flares up, there appears to be a long-running report running at the time. "Circumstantial" because the report may be experiencing the same locking issue and simply can't execute in a timely manner.

It's tough to ask the customer "Who's running a big report", because there are so many users - and none will admit to it Smile

Is there any way to determine who has a particular table locked? We use the KeepTablesOpen = true property on the sessions, so looking at the ServerSessionsLock table shows everyone has the table locked (type='Shared'). We rarely use exclusive locks - generally only for schema updates.

I've also experienced issues viewing the ServerSessionsLock table when executing a long running report. I get an error indicating that it cannot lock the session manager; so that makes it even tougher to see what's going on.

Another theory is that deadlocking is occurring.  Is there any way to detect a deadlock situations?

We are running the 64-bit server, version 2.24.0.2.

Any help is appreciated.

Thanks,
Kevin
Sun, Jun 4 2017 2:33 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Kevin


Have you looked at the system logs in EDBManager?

Open EDBManager, click on the sesssion View Logged Events (just in case you didn't know where it was)

Depending on how the users log in it will tell you the user, if everyone uses the same ID you've had it.

The only thing I can think of for deadlocks is to install MadExcept and report when the application freezes. It won't help day to day but may help in identifying what / where the problem is.

Roy Lambert
Sun, Jun 4 2017 7:31 AMPermanent Link

Kevin Koehne

Thanks Roy.

Yea - the system logs aren't much use because we do use a single user. Even then, there aren't any unusual errors in the log.

Regarding deadlocks, I was thinking there may be something occurring more at the DB level. MadExcept would tell me that that the app is locked - but that is the symptom right now across all processes. When the issue occurs, all users and processes 'hang' waiting for some lock in the database to release.

I'm still working on the theory that the issue is due to a long running report or process, that is run sporadically, that is causing the DB to lock some of the app-critical tables. It's a tough one to debug because it's a fairly complex system. I've instrumented most of the high-use and centralized functions (e.g. I can track when, how long and what parameters are used when reports run, and also track URLs hit in our web service) but our Windows client application has a lot of custom functionality that is difficult to add monitoring code to.

I've considered adding logging of all SQL executions and timings; which would generate a LOT of data. I was hoping that there would be an Elevate tool that had some additional tooling I could utilize to help determine where the locking was occurring.

Kevin
Sun, Jun 4 2017 9:31 AMPermanent Link

Adam Brett

Orixa Systems

Kevin

It sounds like you have a horrible gnarly problem on your hands.

I have the following pretty minimal SQL I run regularly on my larger systems to analyse levels of memory use:

SELECT
 SessionName,
 REPLACE(',' WITH ', ' IN CAST(LIST(DISTINCT Function) as VARCHAR(500))) as Function,
 CAST(Description as VARCHAR(500)) as Description,
 MAX(LogTimeStamp) as DateCreated
FROM Configuration.LogEvents
WHERE Category = 'Error'
GROUP BY SessionName, Description
ORDER BY DateCreated DESC

--

SELECT
 SSS.TableName,
 SUM(SSS.CurrentBufferSize) as MemoryUse,
 SUM(Writes) as Writes
FROM Configuration.ServerSessionStatistics SSS
WHERE NOT TableName = 'EditHistory'
GROUP BY TableName
HAVING MemoryUse > 0
ORDER BY MemoryUse DESC


--

SELECT
 Process as User,
 IF(EXTRACT(Hour FROM LastConnected) > 9 THEN '' ELSE '0')
 + CAST(EXTRACT(Hour FROM LastConnected) AS VARCHAR(10)) + ':' +
 IF(EXTRACT(Minute FROM LastConnected) > 9 THEN '' ELSE '0')
 + CAST(EXTRACT(Minute FROM LastConnected) AS VARCHAR(10)) as Connected,
 SUM(SSS.CurrentBufferSize) as MemoryUse,
 SUM(Writes) as Writes
FROM Configuration.ServerSessions SS
LEFT JOIN Configuration.ServerSessionStatistics SSS ON SS.ID = SSS.SessionID
GROUP BY Process
HAVING MemoryUse > 0
ORDER BY MemoryUse DESC

--

I also have some audit-trail SQL, which I can't easily share as it uses data-tables I add to my systems. These log actions staff undertake such as opening and running reports, accessing data etc. from within the Delphi Application that calls EDB.

These audit data-tables have been useful for the purpose you describe, and were not hugely complex to implement. I just capture the staffID, a timestamp, and the target (calling code-line in the application).

This can work but it has the problem that you are using the database to log issues with the database SmileSo when the database seizes up your problem-solving tool stops working too ...

It is possible to get around this by running 2 instances of EDB Server (on separate ports such as 12010 and 12011) and having "Instance 2" responsible for error logging, with its own database to collect data. As the error logging is relatively light-weight it should not unbalance your server.
Sun, Jun 4 2017 10:04 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Kevin


>Yea - the system logs aren't much use because we do use a single user.

Silly boy Smiley

Even then, there aren't any unusual errors in the log.

If your guess is right there wouldn't be just everything gradually coming to a halt Frown

>Regarding deadlocks, I was thinking there may be something occurring more at the DB level. MadExcept would tell me that that the app is locked - but that is the symptom right now across all processes. When the issue occurs, all users and processes 'hang' waiting for some lock in the database to release.

What I was thinking was that MadExcept would generate an error report - most would show the freeze one should show something different and it might allow you to focus down.

>I'm still working on the theory that the issue is due to a long running report or process, that is run sporadically, that is causing the DB to lock some of the app-critical tables. It's a tough one to debug because it's a fairly complex system. I've instrumented most of the high-use and centralized functions (e.g. I can track when, how long and what parameters are used when reports run, and also track URLs hit in our web service) but our Windows client application has a lot of custom functionality that is difficult to add monitoring code to.

I loath intermittent errors give me a good old crash any day

I can think of a couple of things to try if you get desperate but retrofitting stuff to a nice stable application is a nightmare. How easy would it be to add code around the launcher for the long running reports? That prompts another question - how long is a long running report?

>I've considered adding logging of all SQL executions and timings; which would generate a LOT of data. I was hoping that there would be an Elevate tool that had some additional tooling I could utilize to help determine where the locking was occurring.

I'd suggest opening a support ticket with Tim - he may well know a good way to instrument the server.

Roy Lambert
Mon, Jun 5 2017 4:01 AMPermanent Link

Matthew Jones

Kevin Koehne wrote:

> I've considered adding logging of all SQL executions and timings; which would generate a LOT of data.

It will be a pain, but I think it is the best way forward. I have this as an option in many of my applications, usually only in debug mode with the SQL for obvious reasons, but if you are getting a deadlock, then you need facts and not speculation. Too many times I have been accused of some issue but when I get down and forensic there is a problem elsewhere (often disk full or the server isn't there!).

You'd need to be logging each workstation fully, otherwise it isn't worth doing. In my log system (I love Codesite, but it is more suited to the debugging/component sort of thing, so I have my own too which also writes to codesite!) it auto-rotates the logs according to the day of the week. First log message on tuesday deletes Wednesday.log and writes to Tuesday.log. This stops the log files filling disks. Also you need the time of course.

Then, when the next log happens, grab every machine's log and see what was happening...

Hmm, I have an option in mine which is a boolean for "tight logs but cra^H^H^H poor performance". This closes the log file after every write, to ensure it is on disk if I have to kill the process. If you can't close down nicely, you may lose the vital part of the logs.

--

Matthew Jones
Mon, Jun 5 2017 9:37 AMPermanent Link

Kevin Koehne

Adam Brett wrote:

> It sounds like you have a horrible gnarly problem on your hands.
<

For sure. Smile

Thanks for the monitoring SQL. I'll try out some of that.

> It is possible to get around this by running 2 instances of EDB Server (on separate ports such as 12010 and 12011) and having "Instance 2" responsible for error logging, with its own database to collect data. As the error logging is relatively light-weight it should not unbalance your server.
<

I like that idea of a separate instance. It may take while to get that set up, but I am going to investigate that.
Thanks for the input Adam.
Mon, Jun 5 2017 9:44 AMPermanent Link

Kevin Koehne

Roy Lambert wrote:

Kevin


>>Yea - the system logs aren't much use because we do use a single user.

Silly boy Smiley
<<

Yea Wink


>I loath intermittent errors give me a good old crash any day

I can think of a couple of things to try if you get desperate but retrofitting stuff to a nice stable application is a nightmare. How easy would it be to add code around the launcher for the long running reports? That prompts another question - how long is a long running report?
<
A long running report is usually one that a user mistyped some criteria. I've had users call me and say their report has been running for 30 minutes(!) when it usually takes less than 30 seconds to run. It's because they asked for a years worth of data.

>>I've considered adding logging of all SQL executions and timings; which would generate a LOT of data. I was hoping that there would be an Elevate tool that had some additional tooling I could utilize to help determine where the locking was occurring.

I'd suggest opening a support ticket with Tim - he may well know a good way to instrument the server.
>

Thanks - I'll do that. I did notice that there is now a Tracing function in the latest release of the server.
Mon, Jun 5 2017 9:52 AMPermanent Link

Kevin Koehne

"Matthew Jones" wrote:

>> I've considered adding logging of all SQL executions and timings; which would generate a LOT of data.

It will be a pain, but I think it is the best way forward.
>
I think so too. My biggest challenges are: There are a lot of users and several automated processes, and it's an intermittent issue, so it may run several days without a problem; so there will be a lot of data.

> it auto-rotates the logs according to the day of the week. First log message on tuesday deletes Wednesday.log and writes to Tuesday.log. This stops the log files filling disks.
>
This would help with the quantity of data. Given that what I'm seeing currently, we know about the slow-down within an hour. I could rotate the log hourly.

Thanks for the input.
Mon, Jun 5 2017 9:55 AMPermanent Link

Matthew Jones

Kevin Koehne wrote:

> so there will be a lot of data

My favourite is when the support people send me a file and ask me what the problem was with the system. I usually reply saying that while I could look at each of the 300,000 lines for the day, it might be helpful to have a hint as to what might have gone wrong, and when.

You have a perfect situation in that you will know the right time to be looking at. I often load the big files (from each system) and then just cut out everything more than ten mins before, or whatever seems right (I can always go back and get more if needed). This allows me to focus on the relevant parts more easily.

As I say, it is quite forensic - it is not a simple task to dig into logs.

--

Matthew Jones
Page 1 of 3Next Page »
Jump to Page:  1 2 3
Image