IO Formatter

Software Design Case Study - Input & Output Formatter

The Problem

This was a sub-project in a larger application. The project had identified a collection of data as a "worklist", which contained information about sample tubes. Each worklist had a name and a number of items in the worklist. Each worklist contained a number of racks, and each rack contained a number of tubes.

The problem was that the physical representation on disk of the worklist had changed over time, yet the program had to be able to read any of the older formats. A newer format had also added error detection by inserting a cyclic redundancy check (CRC) at the end of the file. Yet, despite the file format, the information itself was the same, therefore the application design called for a polymorphic Input and Output Formatter, to perform the translation into and out of the physical file format.

The Design

During design, it was apparent that the printer was just another output format, so the printer was added as another output format. This allowed the same application code to be used to send the worklist to the printer.

First, the information that changed (besides the worklist data itself) was assessed:

Different file formats used different file naming conventions and different file extensions; yet, the main application code had to know what these file naming conventions were so that it could create a list of available worklists. In a previous version, this had been done using a switch{} statement in every location where the code had to make a decision based on the selected format. This had proved difficult to maintain.
When outputting, the code knew how many racks and tubes existed. However, when inputting, the number of racks and tubes might not be known until the entire worklist was read, depending on the format.
The types of errors that could occur varied dramatically by file format. The CRC protected format also used multiple files - one for each rack. Other formats used a single file to contain the entire worklist. This created a wide variance in the error potential. When printing was added as an output format, the types of possible errors grew even more.

To handle the varying numbers and types of error conditions, a Generic Error Object was used. See the Error Object case study. Since worklist formatting errors were only one of a number of classes of errors, it had its own error object.

All files were ASCII text, so several file functions were common to all formats. Those functions that could or would be common were collected into a base class. There was no intention to ever instantiate an object of this base class, so the constructor was made Protected.

Interface Definition

The interface was defined first, with all functions being stubbed for incremental implementation:

class CFormat
{
public:
virtual ~CFormat();

protected:
CFormat();

void LogError(ERRORTYPE dError, LPCSTR szErrorText);
void LogFatalError(ERRORTYPE dError, LPCSTR szErrorText);
CErrorObject* GetMyErrorObject();
unsigned long CalcCRC(void* pvBuff, long lSize, unsigned long uCRC);
// Other protected data members...omitted for brevity
};

This base class provided all the common functions, as well as controlled access to the error object. To provide a more convenient method of logging errors, the LogError() and LogFatalError() functions were added.

The logic flow of information was different for input and output operations, but was consistent for all input operations and consistent for all output operations. So the CFormat class was used to derive two other base classes: CInputFormat and COutputFormat. Their functions were pure virtual, allowing polymorphism for the specific input and output formatters.

Again, because these classes were not be be instantiated, their constructors were made protected.

class CInputFormat : public CFormat
{
public:
virtual ~CInputFormat();
virtual BOOL OpenWorklist(LPCSTR szDir, LPCSTR szFileName, CWorklist* wl) = 0;
virtual BOOL CloseWorklist() = 0;
virtual CRack* OpenRack() = 0;
virtual BOOL CloseRack() = 0;
virtual BOOL ReadTubeData(CTube* pTube) = 0;

// Class builder to create the specific formatter.
// Creation will be delegated to a class factory.
static CInputFormat* GetInputFormatter(WLTYPE wlType);

protected:
CInputFormat();
// Other protected data members, omitted for brevity.
};

class COutputFormat : public CFormat
{
public:
virtual ~COutputFormat();
virtual BOOL OpenWorklist(LPCSTR szDir, CWorklist* wl) = 0;
virtual BOOL CloseWorklist() = 0;
virtual BOOL OpenRack(CRack* pRack) = 0;
virtual BOOL CloseRack() = 0;
virtual BOOL WriteTubeData(CTube* pTube) = 0;

// Output Format only methods - replaces use of switch{}
// for making format-dependant decisions.
virtual void DeleteRackFile(LPCSTR szDir, CRack* rack);
virtual LPCSTR GetWorklistFileName(CWorklist* wl) = 0;
virtual LPCSTR GetRackFileName(CRack* rack) = 0;
virtual WLTYPE GetMyType() = 0;
virtual LPCSTR GetMyWorklistExtension() = 0;
virtual LPCSTR GetMyRackExtension() = 0;

// Class builder to create the specific formatter
static COutputFormat* GetOutputFormatter(WLTYPE wlType);

protected:
COutputFormat();
// Other protected data members
};

The various Output Format Only methods represented the format-specific information that the main application had to know. In a previous application, switch{} statements had been used each time this information was needed. However, in this design, the output formatter would be requested to provide the information. For some formats the information was irrelevant (such as filenames when the output format is the printer). In those cases, a NULL value would be returned. This design allowed more formats to be added without impacting the rest of the code. However, now that switch{} statements were no longer used, the default case in the switch{} had to be handled. This was done by providing a Null format. It was an formatter that did nothing - its functions were just stubs. It provided the same functionality as the "default" had provided in the previous switch{} statements.

Creating separate input and output format base classes accommodated the different logic flow between reading and writing. When writing, the application already knew the worklist size, so the logic was designed as follows:

OpenWorklist(Worklist)
FOR each rack in Worklist DO
   OpenRack(Rack)
   FOR each Tube in Rack DO
      WriteTubeData(Tube)
   END FOR
   CloseRack
END FOR
CloseWorklist

However, when reading the application started with an empty worklist and had no way of knowing for certain how many racks and tubes there were until the entire worklist had been read. The logic was designed as follows:

Worklist = OpenWorklist
WHILE reading Worklist DO
   Rack = OpenRack   // This function returns NULL if no more racks exist
   IF Rack opened THEN
      FOR each Tube in Rack DO
         ReadTubeData
      END FOR
      CloseRack
   ELSE
      Flag that worklist reading is over
   END IF
END WHILE
CloseWorklist

The formatters themselves were responsible for logging their errors to the error object. The error object provided a LogError function if the error just had to be noted, but reading could still continue. The error object also provided a LogFatalError function that would cause the formatter's function to return a False or NULL value, which were used to break out of the FOR or WHILE loops. For brevity, the error logic is not shown in the pseudo-code above, but it followed my general Don't Try - DO precept.

To Instantiate a Formatter

The WLTYPE data type was added to define the logical worklist format. It was an enum and was passed to the formatter so that it could instantiate that formatter. I decided that the CInputFormat and COutputFormat base classes were all that the main application should ever know about. I did NOT want dependencies on the header files for the specific derived subclasses - that was information that the main application had no use for.

I gave each base class a static function to get the specific formatter. The application would access the formatters as follows:

CInputFormat* pInput = CInputFormat::GetInputFormatter( wlType );

COutputFormat* pOutput = COutputFormat::GetOutputFormatter( wlType );

Obviously, the virtual base class cannot instantiate the subclasses - that would require the header files for each subclass, creating a circular reference. However, the application did not have to know HOW the formatter created the format object. In fact, using this design, the main application was never aware that the formatter was polymorphic at all - which is just as it should be. The effect is that the base class is asked to polymorph itself.

Object creation was delegated to a class factory - a separate object that CInputFormat and COutputFormat used to create the class. Since only one class factory should exist, it was created as a Singleton - using the static Instance() method to obtain a pointer to the class factory. In addition, each of the specific formatters would be Singletons, so the class factory held pointers to each of the specific objects; after all, there was really no reason for a formatter to be created more than once. Since the main application should have no dependency on the class factory, the header files for CInputFormat and COutputFormat contained no references to the factory. The factory was called by CInputFormat in this fashion:

//-----------------------------------------------
CInputFormat* CInputFormat::GetInputFormatter(WLTYPE wlType)
{
// Get the class factory pointer - the factory is a Singleton
CFormatBuilder* cBuilder = CFormatBuilder::Instance();

// By typecasting the subclass as the base class,
// we have a polymorphic object.
return cBuilder->GetInputFormatter(wlType);
}

//-----------------------------------------------
class CFormatBuilder
{
public:
COutputFormat* GetOutputFormatter(WLTYPE wlType);
CInputFormat* GetInputFormatter(WLTYPE wlType);
static CFormatBuilder* Instance();
static void Destroy();

protected:
CFormatBuilder(); // Prevents object instantiation, except by the Instance() function
virtual ~CFormatBuilder(); // Prevents accidental deletion of the pointer

private:
static CFormatBuilder* m_hInstance; // My own instance

// Instances of the specific formatters (each one is a Singleton):
CInputFormat* m_hCReadFormat1;
CInputFormat* m_hCReadFormat2;
CInputFormat* m_hCReadFormat3;
CInputFormat* m_hCReadNullFormat;
COutputFormat* m_hCWriteFormat1;
COutputFormat* m_hCWriteFormat2;
COutputFormat* m_hCWriteNullFormat;
COutputFormat* m_hCPrintFormat;
};

The Singleton

The book Design Patterns does a better job of explaining this pattern that I will, but basically the idea is to ensure that only a single instance of an object is ever created. That is done by delegating object creation to the object itself. Since the constructor is protected, the object cannot be instantiated by declaring a variable or by using "new" - a compiler error would result.

//-----------------------------------------------
// Singleton Object - Construction/Destruction
// Initialize the static CFormatBuilder instance pointer.
CFormatBuilder* CFormatBuilder::m_hInstance = NULL;

//-----------------------------------------------
// Access is only granted via the Instance() method.
// If object has not been created, it creates itself.
// If object already exists, it returns pointer to itself.
CFormatBuilder* CFormatBuilder::Instance()
{
if (m_hInstance == NULL)
{
m_hInstance = new CFormatBuilder;
}
return m_hInstance;
}

//-----------------------------------------------
// Destructor is protected to prevent accidental deletion.
// Object must be explicitly destroyed.
void CFormatBuilder::Destroy()
{
if (m_hInstance != NULL)
{
    // The CFormatBuilder destructor will destroy the individual format Singletons.
    delete m_hInstance;
    m_hInstance = NULL;
}
}

NOTE: this Singleton code is not thread safe. This application was not multi-threaded, but if it were, it would have to be made thread safe by enclosing the Instance() code inside a critical section. Otherwise, a context switch in mid-if could result in duplicate objects being created, with one pointer ending up lost when the m_hInstance pointer created by the second thread is overwritten after the first thread regains control.

UML Design

The static structure for the output formats is as follows (CPhysicalAspect was the client object that handled the worklist data). The structure for the input formats is similar, and are also derived from CFormat and also use CFormatBuilder as the class factory. CFormatBuilder was the only code that relied on switch{WLTYPE wlType} to decide which object to instantiate, based on the enum WLTYPE. If that format was not applicable, then a NullFormatter was returned. For example, if asked to create the input formatter for the Printer - the NullFormatter would be returned.

The Implementation

The complete source code for the formatters is of little use, since the implementation of the various formats is of no value in this lesson.

Click here for the source code for CFormat test shell, which has a useful CRC calculation algorithm as well as an implementation of the Generic Error Object. This is the same source code as for Case Study #1. It allows the user to type some text and calculate the CRC. It also allows the user to set error flags, which are displayed when the CRC is calculated. The purpose of the shell was to perform integration of the CFormat object and the Generic Error object.

Unit Testing

Unit testing began by creating the Null formatter and verifying that the main application logic ran without crashing. Then the simplest output formatter was created and a worklist written in that format. One format at a time, first the output and then the input, the formatters were created and tested. The only impact that adding formatters had to the main application logic was to require adding a user interface method to select the new format.