Pharos Blueprint Enterprise and Load Balancing Solutions

Purpose

The purpose of this document is to describe Pharos Blueprint Enterprise configurations, considerations, and troubleshooting methods when placed behind a Load Balancing solution. This article does not imply our support of the load balanced configuration; it simply describes how it works within that configuration. Some implementations may require certain features of a load balancing product, like Direct Server Return (DSR; Pharos iMFP for Xerox and Konica Minolta will need this), in order to function.

Background

The Pharos Blueprint Enterprise software architecture follows a Parent:Child hierarchy. The ultimate parent in the configuration is the Analyst server, and its child is the Collector server. Continuing that theme, the Collector serves as the parent for two other components:

Secure Release Here terminals (both physical and embedded options)
Tracker/PrintScout clients

At a simple level, it ends up looking like this:

rtaImage(1) — A typical Pharos Blueprint configuration. The Analyst (left) is parent to two Collectors (middle), who are in turn parents to workstation Trackers and Terminals.

This architecture is great for small to medium sites where servers may be physically located in the building(s) where people are located. But larger, or highly-distributed, sites have different needs. Typically, servers are all located in data centers that are centralized to one or a handful of geographies, while the user and device population is far-flung (often internationally). Add to this a desire for automated client software distribution via Microsoft SCCM, CA LanDesk, or IBM Tivoli (or others), and a requirement to ensure that the system is generally "up", and the architectural design begins to look rather nightmarish.

Load Balancer solutions become very attractive. First, most enterprise organizations utilize load balancer solutions (F5 Networks and Radware are examples of solutions providers) for web servers, as this solution is prime for managing large HTTP server farms. Second, the configuration options within the solutions provide for an effective means of managing geographically-distributed users into specific load balancing groups. And finally (not really finally, but I'm keeping this part short), they simplify the deployment model. How load balancing works is out-of-scope for this publication, but at the most fundamental, it aims to ensure that the target systems behind it all receive an equal amount of network traffic--usually sessions-based.

Technical Note
Load Balancing oftentimes seems doubly-attractive because it looks like it is providing both a seamless deployment and operations model as well as a Disaster Recovery strategy. It is not usually intended as a disaster recovery solution, however. As part of its function, the load balancer does perform a health check (normally as a heartbeat to a named TCP or UDP port listening on the server), but only to ensure that it directs inbound traffic to a device that is available. Anything on that "down" server is lost, potentially with no option of restore. That one important gap disqualifies it as a disaster recovery solution. If anything, it simply extends availability, much like server clustering does.

Warning
Beginning with Windows Server 2003, Microsoft introduced a feature called Network Load Balancing, or NLB. NLB is not supported for print servers, nor does Pharos Blueprint work within its framework. In order to load balance a Blueprint system, a third-party provider must be implemented. Why? Well, in print serving, the session between the client and the server is Stateful, meaning that information about the connection is maintained by at least one side (client, or server, or both). This allows driver changes to be propagated to clients and ensures that print jobs that take some time to spool can do so successfully. NLB, on the other hand, is stateless: all that is important is that there is an output on the other end for any input. Further, Microsoft NLB does not allow for "session stickiness" which is required for terminal sessions.

Working Blueprint Into The Equation

The simplest load balanced configuration creates one load balancing group and targets all installed Collectors as connection candidates in a round-robin fashion. Larger implementations, particularly those where Collectors are spread across two or more data centers, are split into smaller balance groups that are controlled by an over-arching group that is configured to either round-robin between the subgroups or create a geographical distribution. For the purposes of this publication, I'm going to describe configuration in terms of two different Blueprint installation objectives:

Print Monitoring and Policy Print.
Secure Release Here and Device Management

Print Monitoring and Policy Print Configuration

In this configuration, Blueprint Tracker or PrintScout is installed on users' computers. Print jobs are monitored, managed with Policy Print, and recorded, with a day's activity uploading to the parent Collector, which in turn uploads a larger batch of client jobs to the Analyst for reporting purposes. The direct approach to balancing is to create one load balance group and add all Collectors to the group, which is then configured for round-robin client access. The interval defined for session stickiness (in other words, how long can the client access the same server, regardless of the traffic distribution) can be very low.

When deploying clients, use the DNS name created for the load balance group or, if preferred, a DNS CNAME that points to the name of the load balance group.

Considerations

If you wish to use HTTPS rather than HTTP as the protocol for client communication, install a SAN (Subject Alternate Name) certificate where the server's actual domain name is the Subject, with the other DNS name(s) defined as an alternate. If the requested server name by the client doe not match any name in the certificate, there will be problems, and the client upload to the Collector may fail. Although it seems easier to use a wildcard certificate (where *. mycompany.org is used, to basically say "anything within the . mycompany.org domain is trusted), some systems have specific rules which make their use either difficult or unsupported.
The load balance solution need only be configured to route TCP ports 80 and/or 443. TCP 443 is needed only if the Trackers are being configured to communicate over SSL.
There is a high probability that a client will make a connection to more than one Collector over its lifetime. When a Collector uploads its batch file to the Analyst, it also posts the "Last Heartbeat" and "Last Upload" information for insertion into the database. If the dates for either within the database are newer than that submitted by a Collector (because a Collector with more recent information completed its batch upload first), you will see errors in the Collector upload log file inferring that the machine upload for that client failed because newer information is already in the database. This is expected, but can sometimes raise concern that there is a systemic problem; rest assured, there is not. This type of activity will, however, increase import times.

Secure Release Here and Device Management Configuration

Integrating Secure Release Here is the #1 reason load balancing is considered in a configuration. Its popularity is due to several key contributors:

Load balancing simplifies print queue deployment
Load balancing simplifies terminal deployment and configuration

Queue deployment in a standard configuration would require that the users' locale be considered so that unnecessary WAN traffic is not generated (in other words, using "best effort" to keep the jobs as local as possible). This can create sometimes complex logon scripts that are difficult to troubleshoot should distribution problems arise, and are ultimately dependent upon the information within a directory service (Microsoft Active Directory, LDAP, or other X.500 platform like Novell eDirectory) being accurate. Placing the print servers/Collectors in a load balanced group makes queue deployment as easy as specifying "\\ loadbalancer.company.com\MySecureQueue" in the logon script and moving on. If the load balancer's configuration is geographically constrained, the sorting is done by the load balancer, not the script.

Similarly, since load balancing is so web server-friendly and the Pharos EDI service is an IIS-hosted web service, terminal configuration is also brought under the load balancing umbrella. This can make the entire solution more administratively cumbersome than planned, however, and it is best to simply plan terminal deployment around the available pool of Collectors in the classic "Parent-Child" configuration.

Considerations

The load balancer will need to be configured to route TCP ports 80, 443, and 8080 to support necessary Blueprint client (Tracker, Client Tray, and Terminal) operations. If the terminals won't be using the load balancer, you can skip TCP 443, unless workstation-based Trackers are using SSL connections.
The load balancer will need to be configured to route TCP ports 139 and 445, as well as UDP ports 137 and 138, to support print queue connections.
The session length for client connections should be increased to some larger value (15 - 30 minutes) to accommodate larger print job spooling event.
Be aware that Microsoft has made changes to how the Print Spooler responds to client requests utilizing different server names, and this affects how print sharing works when integrating a load-balancing solution. Microsoft Windows 2008 and newer servers will need significant configuration changes for network support (both at the local and client-serving levels) to accommodate the inbound client requests. A document, "TechNote - Configuring Windows Server for Load-balanced Print Sharing.pdf" has been attached to this article that describes these changes. Microsoft may release a new operating system, or patch to an existing operating system version, that removes any ability to share a load-balanced shared queue, rendering this type of solution useless. Pharos Systems does not encourage load balancing print queues for this very reason. Clients that have opted for this configuration have done so understanding the risk involved.
The shared queue(s) created for the load balance deployment must be of the exact same configuration across servers. You can use the information supplied in this Microsoft TechNet article, Migrate Print Servers , for additional information and how to use the available tools for this operation. NOTE: It is advised that the queue being exported not use a third-party port monitor. Install the queue to LPT1, configure as desired, and then export. Use the Secure Queue Configuration utility to secure the queues once they have been created.
If terminals will be configured to use the load balancer as the initial contact point, see "Managing SSL Certificates When Load Balancing Terminals" below and implement that as well.
Review "Playing Three-Card Monte and Print Jobs" below.

Managing SSL Certificates When Load Balancing Terminals

In order to properly secure the communication of user credentials at the terminal, Blueprint is pre-configured to require that all terminal communications be made through a certificate-encrypted/decrypted channel. This is accomplished by configuring HTTPS within Microsoft IIS using a certificate obtained from a trusted authority (Pharos software does include its own Certificate Authority, and Pharos Technical Support can issue valid server certificates for this purpose, but we are not the ideal certificate solution). Because the certificate must support several names for identity purposes, a best practice is to use a "subject alternate name" (or SAN) certificate. This type of certificate allows for several common names in one certificate. Again, common certificate providers are able to issue SAN certificates. Contact Pharos Systems Technical Support if you wish to create a SAN certificate utilizing our Certificate Authority. We can provide both SHA-1 and SHA-256 SAN certificates. See " Pharos EDI: How to create a certificate with Subject Alternate Names (SANs)" to download a form that makes the request process much easier than attempting to provide a SAN certificate request file.

Once the appropriate certificate has been obtained and bound to HTTPS traffic, the Blueprint services will need to be stopped and restarted, allowing the change to apply to the system. This is best accomplished using the Blueprint Server Configuration utility. When the service restart is complete, a self-test will run, and the EDI Web Service test will pass:

This change will need to be made on every Collector that is part of a load balance group.

Playing Three-Card Monte and Print Jobs

Three-card Monte is a popular magician's act and, as it stands, a popular way for grifters to take your money on the streets. The premise is simple: three playing cards are drawn. Two of the cards match, and one is different (two are black, one is red; two are kings, one is an ace). The performer places them face-down on a table, shifts them around, and the "mark" (the one playing the game) has to pick the odd card. Various tricks are implemented to separate the mark from his cash. Usually to much success on the part of the grifter.

Anyway, printing to a set of servers behind a load balance group is a lot like determining the "odd" card. Here's an illustration:

rtaImage (2) — Which server gets the print job?

This can make troubleshooting difficult if the user reports that, after submitting a print job, it isn't available for print release. Let's review what happens under normal circumstances: In all cases, a terminal session sparks a "List Jobs for Terminal" event on the Collector to which it is connected. The Collector then queries the Analyst and gets a list of other Collectors that have received a print request from the user as well (which is potentially all Collectors in a load balanced configuration), and then the Collector contacts them for a job list as well. It looks a little bit like this:

Since a load balanced terminal is typically on a Collector that isn't housing the user's jobs, the terminal's Collector initiates a job transfer from the Collector that is hosting the print job. Then, once the job(s) is received, it is sent to the printer, like this:

rtaImage (4) — The Collector > Collector job transfer process.

In some situations, that can be a lot of data being moved over the network. In Blueprint 5.1 Service Pack 3.1 the concept of "Direct Print," which removes the middle Collector job transfer, was introduced. This function is highly recommended in a load balanced Secure Release Here configuration. Enabling it is discussed on page 9 of the "Blueprint 5 1 Service Pack 3.1 New Features" PDF file that is included in the Service Pack 3.1 download under the "Cross server release support for Direct Printing" section. Beginning with Blueprint 5.2 Service Pack 3, direct printing is enabled by default.

In the event that any troubleshooting has to occur in this configuration, all Collectors in the group must be reviewed to determine where the job was received and where the terminal is being hosted. This can extend the "time to resolution" for incidents related to missing jobs, non-printing jobs, terminal connection errors, and terminal configuration issues.

The Pharos Print Center web application provided in Blueprint 5.2 Service Pack 5.4, and continued in Blueprint 5.3, allows for more ready monitoring and troubleshooting for load balanced implementations. Its Server Groups, Health Overrides, and Event Log functions were developed specifically to assist customers using a load balanced configuration. Specific to troubleshooting, Event Log allows the administrator or support personnel to search user activity across multiple Collectors, reducing the time needed to find where a job "landed" when printed by the client computer.

Troubleshooting a Load Balanced Configuration

Aside from the redirecting that a load balancer incurs, troubleshooting the Blueprint operations is basically unmodified from a standard configuration. The only difference is load balancing requires, as discussed in the above section, that all servers within the group be reviewed, logging increased, and observed during root cause analysis. And remember: if you need help, Pharos Technical Support and the Community are here to help!

What Happens To Secure Release Here When a Load Balanced Collector Goes Down?

This is perhaps the most common question raised when deciding to implement load balancing. To understand the impacts, we go back to Blueprint operation without load balancing:

When a user submits a print job to a Secured queue, the spool file is stored within the file system of the Collector hosting the queue.
The Collector receiving the print job sends a notification to the Analyst that the user has printed a job to its queue.
When a user authenticates at a terminal, the Collector hosting the terminal asks the Analyst for a list of all Collectors that may be holding a job for the user.
The Collector asks for a job list from each Collector in the list returned by the Analyst.

In the event that a user's job is being held on a Collector that goes down, a few of things happen:

The list of Collectors returned by the Analyst contains the "down" Collector.
The down Collector will not respond to the "List Jobs" call by the terminal's Collector.
If the timeout for the Collector communications exceeds the "Inactivity Timeout" for the terminal (the default for this timeout is 30 seconds, and is configurable), the user's session at the device will be terminated without much fanfare. To the user, it will look like they did not log in successfully, or they may be presented with this error:
"Could not connect to net.tcp:// BPCollector.yourcompany.com/PharosSystems/SecureRelease/Services/SecureReleaseService/JobService.svc . The connection attempt lasted for a time span of 00:00:21.0984664. TCP error code 10060: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 192.168.1.100:808."
This error means "I tried to contact ' BPCollector.yourcompany.com' for a list of jobs for the user, but it was taking too long for a response, so I stopped."

This will also affect the Workstation Client (PSClientTray.exe) and the Secure Release Here Troubleshooter. The end result is that the user cannot release print jobs. To resolve this requires some effort on the SQL Server hosting the Blueprint database:

USE psbprint
GO
DELETE FROM UserPrintActivity WHERE ServerId = (SELECT ServerId FROM Servers WHERE ServerName LIKE 'BPCollector%')
GO

Change the value of BPCollector the name of your server. This will remove the jobs for all users that reference the down Collector and allow users to log in and see their list of jobs except those that were on the down server. In most cases, those jobs may be permanently lost, depending on where the server failure occurred and how long it takes to resolve the problem and return the server to the environment.

Please see the attached file for more information.