A blog focused on explaining the new features in v2 and how to develop for and integrate with the HPC 2008 scheduler. This blog is owned by a group of Windows HPC Server 2008 development professionals. |
|
7/31/2008I was asked today where there was a quick explanation of how to submit a job from the command line, and I realized that I didn't have a really good answer. So I figured I'd do a blog post (first in a while, I know!) on how to do this.
Starting Simple
Let's say your have an application called "Divide.exe" which takes two arguments, "-Numerator X" and "-Denominator Y". So for example, to divide 3/6 you would run the command:
divide.exe –Numerator 3 –Denominator 6
So now you want to run this thing on a single processor on some node in the cluster. It's really easy! You can just do:
job submit divide.exe –Numerator 3 –Denominator 6
No problem, right?
Getting Parallel
Of course if you're using a cluster, you don't just want to run one command line on one processor of one machine. You want to run in parallel! There are a few ways that you can create parallel jobs depending on the way that your application works.
The two most common ways to do this in HPC are using MPI or using Parameter Sweeps.
Submitting MPI Jobs
Submitting MPI jobs is just as easy as submitting any other job! You simply do a job submit followed by your mpi command line. The number of options offered by mpi is astounding (try mpiexec –help3 for more on that) and will probably be covered in a future post, but the basic command line to submit a 16 processor mpi job would be:
Job submit /numcores:16 mpixec MpiDivide.exe –Numerator 3 –Denominator 6
Submitting a Parameter Sweep
We use the term "parameter sweep" to refer to a job which encompasses running a single, serial application many, achieving parallelism by running many instances of it at the same time. In HPC Server (and in the Compute Cluster Pack), this is accomplished by creating a job with many tasks. So for example, you could create a job that divded 3 by 6, 9, and 12 by doing the following (where X is the job ID returned by the first step):
Job new /jobname:"My Parameter Sweep Job"
Job add X divide.exe –Numerator 3 –denominator 6
Job add X divide.exe –Numerator 3 –denominator 9
Job add X divide.exe –Numerator 3 –denominator 12
Job submit /id: X
New in HPC Server 2008, we have a much simpler (and faster!) way of running parameter sweeps using wildcards! This approach will save you a lot of time since they're much easier to create and because the Job Scheduler needs to store much less information for a Parametric Task than it does for job with many distinct tasks. An example of how to do the same step above using this new technique follows:
Job submit /parametric:6-12:3 divide.exe –Numerator 3 –denominator *
For more info on how to use the new HPC Server parametric sweeps, see my earlier blog post, Making a Clean Sweep with Windows HPC Server 2008.
For more info on the command line tools in general, check out our Command Line Reference. Unfortunately it's not yet updated for v2 . . . but there should be a new one out on the web in just a few weeks! 6/19/2008We've posted a white paper on how Job Templates can help you manage your cluster. You can check out the white paper here:
5/15/2008
Parameter Sweeps are one of the most common types of jobs that get run on HPC clusters, and we've done some work in Windows HPC Server 2008 to make them easier (and faster) than ever. Today I'll dig in a little on what's different and how to take advantage of these new features.
For those familiar with other scheduling products, you may recognize these as very similar to "Job Arrays."
What is a Parameter Sweep?
Simply put, a parameter sweep is when you run a single command many times over a set of different input parameters. Problems which can be solved in this way are very common. They're also a great use of HPC clusters, since parameter sweeps are inherently embarrassingly parallel; namely they can be run in parallel with little or no effort and scale almost limitlessly.
In general, Parameter Sweeps take the form of a single command line which is run N times, with different input, output, and/or command line arguments for each of the N steps. For example, think about an application called FileZipper.exe that takes an input file and generates a compressed output file. To run it on 100 data files, you'd basically want to do something like:
FileZipper.exe <FileToZip1.dat >ZippedFile1.zip
FileZipper.exe <FileToZip2.dat >ZippedFile2.zip
FileZipper.exe <FileToZip3.dat >ZippedFile3.zip
…
FileZipper.exe <FileToZip100.dat >ZippedFile100.zip
These instances can already independently of one another, and you can run as many in parallel as you have processors to do the compression with.
Make sense? If it does, then you already understand pretty much everything there is to know about Parameter Sweeps.
What's different in Windows HPC Server 2008?
In the Compute Cluster Pack (our product from 2005), parameter sweeps could be generated pretty easily from the UI by inputting the Start Index (the number to start counting from), End Index (the last number to use use), and the Increment (the number to add for each step in the sweep). Using the UI to create such a sweep would then create N individual tasks in the job. This was useful, but had a number of downsides, namely:
- You're storing a lot of repeated information in the scheduler database
- If you wanted to change a part of your sweep, you had to change every task
- Large sweeps became very unwieldy to view in the UI or at the command line
In Windows HPC Server 2008, we add a new type of task called a Parametric Task:

Figure 1: Adding a Parametric Task to a Job
Now, these tasks are stored as a unit, which makes, storing, editing, managing, and monitoring them a snap! Let's go ahead and give it a try . . .
Creating a Parameter Sweep
To create your first parameter sweep, open up the HPC Job Manager. The simplest way to create a sweep is to click on the Parametric Sweep Job link in the right-hand Actions Pane.
Let's create a sweep in the form of the example up above; namely, a sweep that zips up 100 files:
- Go ahead and provide a Name like "File Zipper".
- Set the Start Value of 1 and an End Value of 2.
- Leave the Increment Value to be 1 (since we'll be counting up by 1's).
- Enter your Command Line, in this case "FileZipper.exe".
-
We'll see the Standard Input and Standard Output as above:
- Stdin: FileToZip*.dat
- Stdout: ZippedFile*.zip
- Check the preview box at the bottom of the dialog to see what your sweep will look like, then go ahead and submit.
You should end up with something that looks like this:

Figure 2: Creating a Parameter Sweep Task
Tracking Your Sweep's Progress
If you check the job list, you should now see that you've submitted a job with a single task in it. But actually, you can easily track each task individually by checking the box labeled Expand parametric tasks. This allows you track your sweep as a unit, but dig in on failures or results for individual steps.
Doing that From the Command Line
Of course you can do the same thing from the Command Line:
C:\>job submit /parametric:100 /StdIn:"FileToZip*.dat" /StdOut:"ZippedFile*.zip" FileZipper.exe
Or from PowerShell:
PS> New-HpcJob | Add-HpcTask -Parametric -Start 1 -End 100 -Stdin "FileToZip*.dat" -Stdout "ZippedFile*.zip" -CommandLine "FileZipper.exe" | Submit-HpcJob
That's all for this time. Happy sweeping! 4/29/2008
Job Templates are one of the most whiz-bang new features in HPC Server 2008, but we've gotten a lot of feedback from the community that you don't really know what they're for or how to use them. So I figured a quick post here might help solve that problem by introducing you to one of the most powerful features of the v2 Job Scheduler.
What is a Job Template?
Simply put, a job template is a custom submission policy configured by the admin. Admins can create a number of different job templates and then let users pick the one that is right for them and their job (assuming they have the necessary permissions).
Job Templates vs. Job Queues
In many ways, HPC Server job templates are the same the queues found in other scheduling products (like Platform's LSF), in that they allow you to:
- Partition the cluster
- Give different permissions to different jobs and users
- Provide handling for different types of jobs
That being said, job templates aren't queues because of the simple fact that in the end, all jobs submitted to the system end up in the same queue. We think that's a great design, because there is a single place to view all of the jobs, and a single, ordered queue of all the jobs waiting to execute.
How Job Templates Work
Job templates work by allowing administrators to provide Defaults and Constraints to every job that comes into the system. They are also ACL'd, meaning that Administrators can control which sets of users can submits which types of jobs.
This diagram quickly explains how Job Templates are applied to a job:
Figure 1: The Scheduler Validates a Job Using a Job Template
How to Use Job Templates
The easiest way to explain how to use job templates is to make up a scenario and then show you how to enforce the policies required by that scenario. So let's do just that . . .
Say you have two groups of users:
- Paying Customers- These are the guys who paid for the cluster to be installed, and they get nearly unlimited rights to use the cluster as they see fit.
- Freeloaders- These are other employees at your company. They are allowed to use the cluster, but only in limited amounts and only if they don't get in the Paying Customers way.
Let's go ahead and create two templates, one for each group, with the necessary settings and permissions to make everything work out nicely.
Step 1: Create a template for the Paying Customers group
First, let's head into the product and great a template for the "Paying Customers:"
- Click on "Configuration" in the lower left of the HPC Cluster Manager, and then select "Job Templates" in the Navigation Pane.
- Now select "New" in the Action Pane to create a new Job Template.
- On the Welcome page, set the template name to "Paying Customers Template."
- Accept the defaults on the Job Run Times page, since these guys should be able to do whatever they want.
- On the Job Priorities page, set the Maximum priority to Maximum; this will allow paying customers to submit jobs with any priority that they'd like.
- Accept the defaults on the Project Names and Node Groups pages.
- Click finish to complete the wizard and create this job template.
Step 2: Create a template for the Freeloaders groups
- Click on "New" again to create another new Job Template.
- Enter the name "Freeloaders" on the Welcome page.
- On the Job Run Times page, enter a maximum run time of 1 hour to prevent Freeloaders from submitting long-running jobs.
- On the Job Priorities page, set a default priority of Lowest and a Maximum Priority of Below Normal. This will ensure that Freeloader's jobs always get a nice low priority . . . so Paying Customers will pass them in the queue and even pre-empt them if the pre-emption scheduling policy is enabled.
- Accept the defaults for Project Names and Node Groups.
- Click finish to complete the wizard and create this job template.
Step 3: Set permissions
The final step is to set the appropriate permissions so that no one can use job templates that they shouldn't.
-
In the job templates view, highlight the "Default" template and say "Set Permissions."
- Remove the "Users" group from this ACL so that no users can use the Default template any more.
-
Now highlight the "Paying Customers" template and say "Set Permissions."
- Remove the "Users" group from this ACL.
- Add the Paying Customers group (you can create a new local group and manage the users in it by doing to Computer -> Right-click -> Manage -> Users and Groups from the start menu), and give them the Submit Job permission.
-
Now highlight the "Freeloaders" template and say "Set Permissions."
- Remove the "Users" group from this ACL.
- Add the Freeloaders group (you can create a new local group and manage the users in it by doing to Computer -> Right-click -> Manage -> Users and Groups from the start menu), and give them the Submit Job permission.
Step 4: Profit
You're actually done. At this point, members of your local "Freeloaders" group can only submit using the Freeloaders template. This means they can't submit jobs with a priority above Below Normal, and they can't submit jobs that run for more than 1 hour. Sucks to be them, huh? Meanwhile members of the "Paying Customers" group can pretty much do whatever they want. Members of both groups can use either job template.
Step 5: Getting more advanced
There are actually many more advanced things that you can do with Job Templates, since they can be used to default and constrain any job property. For an example, let's say you wanted to change things up by saying that Freeloaders couldn't submit jobs which required Exclusive use of a node. This can easily be done!
- Highlight Freeloaders in the job template view and click "Edit."
- Click the Add button to add a constraint, and select the Exclusive constraint.
-
Highlight the Exclusive constraint in the Job Template Details window to show the settings for this Job Property.
- In the Details for the Exclusive constraint, set the Default Value to "False," and set the Valid Values to include only "False."
- Hit Save.

Figure 2: Editing a Job Template
Now when Freeloaders submit a job, it will always be marked as non-exclusive (because Exclusive is False by default). If they try to mark it as exclusive, job submission will fail (since True isn't in the list of valid values for the Exclusive property). 4/23/2008
Do you want to access the HPCS 2008 Job Scheduler from other environments such as Java, Linux, etc? Or build Job Submission tools around a standard web services based interface?
Well, in the latest Community Technology Preview (CTP) released this week we have included a new feature – the HPC Basic Profile Web Service or the HPCBP for short – that can help you do just that! This is a web service, built using the Windows Communication Foundation (WCF) that provides access to some of HPCS 2008’s core job submission functionality. Through the HPCBP you are able to submit a job, discover a job’s status, discover a job’s properties, terminate a job, and find out information about the cluster that you are running on.
The primary motivation for this feature came from other groups in the HPC community who wanted a standard interface that allowed jobs to be passed between HPC resources. Over the last few years, using an open process within the Open Grid Forum (OGF), developers from industry and research from both the open source and commercial software communities, have come to agreement on the web service interface and protocols that can provide greatest interoperability. These set of specifications are encapsulated within the HPC Basic Profile 1.0.
More information relating to the HPCBP specification and its implementation and deployment within HPCS 2008 can be found here. 4/14/2008
This week, I’d like to take some time to explain how a new feature, Multi Level Resource Allocation, can help you get the most out of your applications.
The basic explanation for this feature is that when creating a job, you can choose at what granularity your job gets scheduled. This is as simple as picking from a drop down in the UI, but as with most choices, it deserves a bit of thought!

Figure 1: Setting the resource unit type on a job
The first question that pops to mind is: what exactly do Core, Node, and Socket mean?
· Node (a.k.a. host, machine, computer) refers to an entire compute node. Each node contains 1 or more sockets.
· Socket (a.k.a. numa node) refers to collection of cores with a direct pipe to memory. Each socket contains 1 or more cores. Note that this does not necessarily refer to a physical socket, but rather to the memory architecture of the machine, which will depend on your chip vendor.
· Core (a.k.a. processor, cpu, cpu core, logical processor) refers to a single processing unit capable of performing computations. A core is the smallest unit of allocation available in HPC Server 2008.
Next, let me explain how resources actually get allocated to your job. To do, I’ll refer to this handy diagram (labeled as Figure 2 if I’ve got my post to publish correctly).

Figure 2: Multi Level Resource Allocation at work
In the above example, job J1 requested allocation at the Socket level. This may mean it has a single task that requires 3 sockets, or many tasks which each require 1 socket. The scheduler has reserved 3 sockets for it (and since it’s running on quad-core sockets, it’s implicitly been allocated 12 cores). Assuming it is a job with many single-socket tasks, the scheduler will start a single task per socket in the job’s allocation.
Job J2, on the other hand, requested allocation at the Node level, and has been allocated a single node (and implicitly, 16 cores). The scheduler will thus start 1 task on each node in the jobs allocation. No other jobs or tasks can be started on that node, so it’s quite similar to using the task Exclusive property.
Job J3 has requested Core allocation, and has shown above, it is has been allocated 4 cores. The scheduler starts 1 task per core.
When should I use each level?
When to use each of these settings will depend on your application, and some experimentation is necessary. In general, the rule is:
· Use core allocation if your application is CPU bound; the more processors you can throw at it the better!
· Use socket allocation if memory access is what bottlenecks your application’s performance. Since how much data can come in from memory is what limits the speed of the job, running more tasks on the same memory bus won’t result in speed-up since all of those tasks are fighting over the path to memory.
· Use node allocation if some node-wide resource is what bottlenecks your application. This is the case with applications that are relying heavily on access to disk or to networks resources. Running multiple tasks per node won’t result in a speed-up since all of those tasks are waiting for access to the same disk or network pipe.
Some key facts:
· The unit type set on your job also applies to all tasks in that job (i.e. you can’t have a job requesting 4 nodes with a bunch of tasks requesting 2 cores each).
· You can still use batch scripts or your applications mechanisms to launch multiple threads or processes on the resources that your job is allocated.
· By using these correctly, you can improve your cluster utilization since jobs are more likely to get only the resources they need. See Figure 2, where job J1 and job J2 can peacefully coexist on a node.
· This feature is explicitly designed to work with heterogeneous systems, namely those where your compute nodes have varying hardware. So a socket allocation job will still get a dedicated pipe to memory for each task whether you are running single-core, dual-core, or quad-core processors. A node allocation job will get a node per task, whether those nodes have 1 core or 16.
3/31/2008
For me, one of the most exciting new features in Windows HPC Server 2008 is our Windows PowerShell interface. It's not only incredibly powerful, it's quite easy to learn! And it's very useful for posting demos and examples up here. So I figured it made pretty good sense to have my first real posting here walk through the basics of using the PowerShell interface to the Windows HPC Job Scheduler.
If you're not familiar with PowerShell at all, you may want to try checking out of the many PowerShell tutorials available out there on the net.
Let's start with the basics. First, let's create a new job (using the New-HpcJob cmdlet) called "PowerShell Test" that is limited to a 4 hour run time and has a priority of "Above Normal":
PS> New-HpcJob -Name "PowerShell Test" -RunTime "0:4:00" -Priority "AboveNormal"
Now our job is in the system, and we'll want to add a task. The best way to do that sort of manipulation in PowerShell is to grab hold of the actual object. But of course I forgot to do that when I ran my New-HpcJob command. So now I'll need to use the Get-HpcJob cmdlet to go grab my job out of the scheduler and assign it to my variable:
PS> $MyJob = Get-HpcJob -Name "PowerShell Test"
Now that the $MyJob variable has my job in it, I can go ahead and add a task. I'll just use a simple task . . . one that runs "dir" in my home directory so I can see what I have stored in my home directory on the compute node. I can use pipes to send my job into the Hpc-AddTask cmdlet:
PS> $MyJob | Add-HpcTask -WorkDir "%UserProfile%\Documents" -Command "dir"
Great! Now my job is ready to go to the cluster. There are actually two ways I can submit my job. I can use the submission cmdlet (Submit-HpcJob), or I can call the Submit method on my job object:
PS> $MyJob | Submit-HpcJob
or
PS> $MyJob.Submit()
Now if you want to go ahead and see what happened to your job, you'll have a two step process. First, call the Refresh method to update your variable with data from the server:
PS> $MyJob.Refresh()
That done, you can use the Get-HpcTask command to view the details of the tasks in your job, and the Format-List (aliased to fl) command to make that output more readable:
PS> $MyJob | Get-HpcTask | Fl
By now you might be saying, "So what?" Well, I do still have one trick up my sleeve. You can actually do all that with one line of PowerShell using piping. Go ahead and try this:
PS> $MyJob = New-HpcJob -Name "PowerShell Test" -RunTime "0:4:0" -Priority "AboveNormal" | Add-HpcTask -WorkDir "%UserProfile%\Documents" -Command "dir" | Submit-HpcJob | fl
Pretty neat, eh? Not to be outdone by some of the other parts of the Windows HPC Server product, we wanted to start a blog to get you information on the Windows HPC Server job scheduler, including details on:
- What's new in HPC Server 2008
- How to use different interfaces to the scheduler
- How the various job scheduling policies can help you solve your workload management problems
- How to configure and customize the HPC Server job scheduler
Stay tuned here for post on these and other topics. If there's anything you'd like some more details on, please leave a comment and let me know . . . I'll do my best to serve all requests!
|
|
|
|