Posted by Ancestry Team on October 17, 2014 in C#, Development, Distributed Computing

Here at Ancestry.com, we currently use Microsoft’s High Performance Computing (HPC) cluster to do a variety of things.  My team has multiple things we use an HPC cluster for.  Interestingly enough, we don’t communicate with HPC exactly the same for any distinct job type.  We’re using the Service Oriented Architecture (SOA) model for two of our use cases, but even those communicate differently.

Recently, I was working on a problem where I wanted our program to know exactly how many tasks in a job had completed (not just the percentage of progress), similar to what can be seen in HPC Job manager.  The code for these HPC jobs uses the BrokerClient to send tasks.  With the BrokerClient, you can “fire and forget”, which is what this solution does.  I should note that the BrokerClient can retrieve results, after the job is finished, but that wasn’t my use case.  I thought there should be a simple way to ask HPC how many tasks had completed.  It turns out that this is not as easy as you might expect, when using the SOA model.  I couldn’t find any documentation on how to do it.  I found a solution that worked for me, and I thought I’d share it.

HPC Session Request Breakdown, as shown in HPC Job Manager
HPC Session Request Breakdown, as shown in HPC Job Manager

With a BrokerClient, your link back to the HPC job comes from the Session object used to create the BrokerClient.  From a Scheduler, you can get your ISchedulerJob that corresponds with the Session by matching the ISchedulerJob.Id to the Session.Id.  My first thought was to use ISchedulerJob.GetTaskList() to retrieve the tasks and look at the task details.  It turns out that for SOA jobs, tasks do not correspond to requests.  The tasks don’t have any methods on them to indicate how many requests they’ve fulfilled, either.

My solution was found while looking at the results of the ISchedulerJob.GetCustomProperties() method.  I was surprised to find the solution there, since the MSDN documentation states that this is “application-defined properties”.

I found four name-value pairs which may be useful for knowing the state of tasks in a SOA job, with the following keys:

  • “HPC_Calculating”
  • “HPC_Caclulated”
  • “HPC_Faulted”
  • “HPC_PurgedProcessed”

I should note that some of these properties don’t exist when the job is brand new, with no requests sent to it yet.  Also, I was disappointed to find no key corresponding to the “incoming” requests, since some applications might not be able to calculate that themselves.

With that information, I was able to write code to monitor the SOA jobs.

With all that said, I should also say that our other SOA HPC use case monitors the state of the tasks, and is capable of more detailed real-time information.  We do this by creating our own ChannelFactory and channels.  By using that, the requests are not “fire and forget” – we get results back from each request individually as it completes.  We know how many outstanding requests there are, and how many have completed.  If we wanted to, we could use the same solution presented for the BrokerClient to find out how many are in the “calculating” state.

One last disclaimer:  These “Custom Properties” are not documented, but they are publicly exposed.  Microsoft could change them.  If they ever do, I hope they would consider it a breaking change, and document it.  There are no guarantees of that, so use discretion when considering this solution.

Comments

Join the Discussion

We really do appreciate your feedback, and ask that you please be respectful to other commenters and authors. Any abusive comments may be moderated. For help with a specific problem, please contact customer service.