1

Resolved

Exception is thrown when lock previously held by a failed member is locked and then released by another one group member

description

Process A is write-locks "lock1", then it fails (I simulated a network crash by disabling the network adapter), view update is happened, then process B acquires write-lock (or read-lock - no matter) on "lock1", then it unlocks it and entire group fails with log message: Locking package exception: RELEASE by (2568:<ip1>/9753:9754) of lock1 held by [(2828:<ip2>/9753:9754)].
<ip1\2> - our real IP addresses.

Added:
It looks like lock type (read\write\generic) doesn't matter, so general case is: if process locks and then fails, and another one will try to lock\unlock the same lock (name) - group will fail.

comments

birman wrote May 3, 2016 at 12:57 PM

No problem, we can have a look.

Could you please post a minimal example of code that will trigger this problem? To diagnose it and fix it, reproducing it is important.

Ideally, a minimal example would have no lines of logic not actually required to cause the problem, plus of course instructions on precisely how to run it to see the issue occur.

konst3d wrote May 4, 2016 at 10:16 AM

Sorry for the wrong description in my first post - did quite intensive vsync\locking testing yesterday.
Here is updated steps to reproduce.
First - source code.
using System;
using Vsync;

namespace WriteLockApp
{
    class Program
    {
        static void Main(string[] args)
        {
            VsyncSystem.Start();

            var group = new Group("group1");

            group.ViewHandlers += view =>
            {
                Console.WriteLine("Group view changed. Addres: {0}, rank: {1}, total: {2}, alive: {3}, failed: {4}.",
                    view.gaddr,
                    view.GetMyRank(),
                    view.GetMembers().Length,
                    view.GetLiveMembers().Length,
                    view.GetFailedMembers().Length);
            };

            group.Join();

            Console.WriteLine("Press enter to ackquire write-lock.");
            Console.ReadLine();
            Console.WriteLine("Ackquiring write-lock...");

            if (group.WriteLock("lock1", 5000))
            {
                Console.WriteLine("Write-lock acquired, press enter to release...");
                Console.ReadLine();
                group.Unlock("lock1");
            }
            else
            {
                Console.WriteLine("Failed to ackquire write-lock.");
            }

            Console.WriteLine("Press any key to exit.");
            Console.ReadKey();

            group.Leave();
            VsyncSystem.Shutdown();
        }
    }
}
You will need 2 machines to reproduce the problem.

1) Run application on the fist machine, but do not acquire lock.
2) Run it on the second machine, acquire lock.
3) Close the application on the second machine using windows close button.
4) Wait for group view update in the first application.
5) Run the application on the second machine once again, acquire lock, release it.
6) Application on the first machine crashes with vsync exception (Locking package exception: RELEASE by (424) of lock1 held by [(2584)]).

birman wrote May 4, 2016 at 12:25 PM

Clearly this must be a bug. I'll fix it for you ASAP.

As a coding comment: Vsync doesn't do well if the thread that called VsyncSystem.Start() hangs. In your code that thread is the one that calls Console.ReadLine(), so in fact it does freeze up at that point. You should do your console reads in a different thread to avoid having the system think your application has crashed while in fact it is just waiting for you to type a newline.

birman wrote May 4, 2016 at 2:21 PM

OK, so I did reproduce the problem and will grant that this is a bug. But it does not manifest if I fix the blocking behavior of your code as follows:
...
class Program
{
    static void Main(string[] args)
    {
        VsyncSystem.Start();

        var group = new Group("group1");

        group.ViewHandlers += view =>
        {
            Console.WriteLine("Group view changed. Addres: {0}, rank: {1}, total: {2}, alive: {3}, failed: {4}.",
            view.gaddr,
            view.GetMyRank(),
            view.GetMembers().Length,
            view.GetLiveMembers().Length,
            view.GetFailedMembers().Length);
        };

        group.Join();

        Thread T = new Thread(delegate()
            {

                Console.WriteLine("Press enter to ackquire write-lock.");
                Console.ReadLine();
                Console.WriteLine("Ackquiring write-lock...");

                if (group.WriteLock("lock1", 5000))
                {
                    Console.WriteLine("Write-lock acquired, press enter to release...");
                    Console.ReadLine();
                    group.Unlock("lock1");
                }
                else
                {
                    Console.WriteLine("Failed to ackquire write-lock.");
                }

                Console.WriteLine("Press any key to exit.");
                Console.ReadKey();

                group.Leave();
                VsyncSystem.Shutdown();
            });
        T.Start();
        VsyncSystem.WaitForever();
    }
}
...

So my feeling is that on the one hand, you've exposed a kind of error in Vsync that I should fix. But basically this was caused by your bug, namely that the thread that called VsyncSystem.Start() is not permitted to block and normally should just call VsyncSystem.WaitForever(). In effect you forced a very non-standard thread scheduling sequence that exposes a kind of bug. My fix will turn out to throw an exception, though (something about how you have "violated the restriction against blocking in the main Vsync thread"). So I don't think you would want to wait for this fix before fixing your code! :-)

birman wrote May 4, 2016 at 5:55 PM

Now I'm wondering about my own comment above. In fact I'm not so sure your code was really breaking the "don't block in situations where you hold a Vsync lock" rule after all. More study needed. This said, it is in fact true that the issue vanishes if you modify the program as shown above.

birman wrote May 4, 2016 at 6:50 PM

Fixed in V2.2.2095. And in fact don't worry about my comment about blocking. Your code was ok.

konst3d wrote May 5, 2016 at 7:32 AM

Thanks a lot for the fix.
In fact, at the moment my real code is windows forms application and things like vsync startup, group join, lock\unlock etc... are inside button click handlers. The code I posted here is just minimal console apps to reproduce the issue.
Could you please confirm once again - VsyncSystem.WaitForever() is not required for stable vsync functioning, so I shouldn't rewrite my current code - vsync start and group join inside GUI event handlers and no calls to WaitForever() method?

birman wrote May 5, 2016 at 12:35 PM

Vsync 2.2.2095 is already posted over on the downloads page, but off on the right-hand side. We should let people who use locking test it more before it replaces the main release.

On your question, the main thing to realize is that:

1) The thread that called VsyncSystem.Start() is monitored by Vsync. If that thread terminates, Vsync auto-executes a shutdown.

2) If an upcall from Vsync to your application code blocks, this causes all of Vsync to malfunction. So when we deliver a message, a new view, or do some other form of upcall such as in the DHT collision resolver, you need to fork off a new thread prior to doing things like Console.ReadLine.

birman wrote May 5, 2016 at 12:37 PM

PS: VsyncSystem.WaitForever() does block, but the point is that it is normally called on a thread that YOU created that previously called VsyncSystem.Start() and now is about to terminate. You would not call it on other threads -- there wouldn't be any reason to do so and it would cause a leak of threads.

konst3d wrote May 5, 2016 at 1:45 PM

Well, so the thread that creates\starts vsync is better left for vsync.
And in case of windows forms application, where you cannot control the thread that executes GUI events handlers nor you can handle GUI in any other thread, the best solution is to create a separate one thread that will create vsync and then VsyncSystem.WaitForever() and forget about this thread - it will be terminated once vsync stopped, right?

birman wrote May 5, 2016 at 2:03 PM

Right. The shutdown causes WaitForever to return (so in fact it won't wait forever) and then the thread can terminate by just returning after disposing any resources of your own that you might want to clean up.