[go: up one dir, main page]

0% found this document useful (0 votes)
141 views60 pages

From CPP To COM

Uploaded by

Ivan Zhang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
141 views60 pages

From CPP To COM

Uploaded by

Ivan Zhang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 60

From CPP to COM

 
Markus Horstmann
OLE Program Manager, Microsoft Corporation

Created: September 28, 1995


Revised: October 19, 1995

Markus Horstmann joined Microsoft in September 1995 as a Program Manager for


OLE. Prior to that, he worked for Información Selectiva, Mexico, leading the
design and development of their next-generation, real-time information
management platform, as well as leading the design, development, and roll-out
of a distributed, interactive financial application based on this platform. He
studied Informatics at the University of Dortmund, Germany. He is a Microsoft
Certified Systems Engineer and a Microsoft Certified Product Specialist for
Microsoft Excel.

Table of Contents

Topic Associated Sample

Abstract  

Introduction  

DB: C++ Object Used in a C++ Client DB

DB_CPPDLL: C++ Object in a DLL, Used by a C++ Client DB_CPPDLL

DB_VTBL: C++ Object Exposing "Pure" Abstract Base DB_VTBL


Class

DB_VTBLDLL: C++ Object in a DLL Exposing "Pure" DB_VTBLDLL


Abstract Base Class

DBAlmostCOM: C++ Object in a DLL, Loaded Through DBAlmostCOM


COM

DBCOM: COM Object in a DLL, Loaded Through COM DBCOM

DBCOMMUL: COM Object with Multiple Interfaces DBCOMMul

DBCOMREM: COM Object in a Separate Process DBCOMRem

Conclusion  

Bibliography  

Abstract
The Microsoft® Component Object Model (COM) borrows a key part of its
architecture from standard implementations of C++. This article uses an existing
project written in C++ and turns it into a component-based design that runs the
object in a separate process, leveraging the COM infrastructure provided natively
on all newer Microsoft operating systems. The focus is on explaining and deducing
why and how features are implemented in COM, providing a robust, portable, and
transparent model for distributed component software.

Introduction

The idea for this technical article arose from a deep frustration about the
apparent complexity of COM and OLE, as they are presented in the OLE Design
Specification and in many later publications. After having studied these
publications and the online documentation many times, I finally understood how
everything fit together. This, in turn, illuminated how COM supplies a great
foundation for safe and robust interaction among objects.

Once I understood the basic structure of COM and how it interacts with and
supports objects, I realized that it would have been very helpful if, from the
beginning, the specification and product documentation had clearly separated the
COM discussions from the implementation of OLE features. Also, being a person
who needs to understand a system from the "bottom up," this separation would
have made OLE much easier to learn. This approach for explaining OLE and COM
is now taken more frequently; for example, the new edition of Kraig
Brockschmidt's Inside OLE does a very good job of explaining each of the
technologies separately.

Before his book was available, I was asked to give a class on COM and OLE. For
this class I implemented, among other things, a code sample that explains all the
design features of the basic COM architecture by turning a simple object and its
client, implemented in C++, first into a standard Windows® dynamic-link library
(DLL) architecture and then into a COM object. In this environment, C++ objects
just communicate directly in their own "language." [In fact, in my personal
opinion this technology will sooner or later (probably later) replace the current
way in which DLLs interface to the outside world (through exported entry points,
that are bound at load time), although COM currently needs a few standard entry
points using conventional DLL technology. Objects could receive a parameter to
their standard entry point—that is, a pointer to the “Operating System Object”—
and access all the functionality through this object, using only COM Interfaces
(window creation, file access, and so on). Device drivers could be COM objects,
managed (dispatched?) by the operating system.]

I originally used the Microsoft Foundation Classes (MFC) to create a project to


accompany this article, consisting of a main directory and several subdirectories.
For ease of use, this has been split up into eight samples, each associated with a
major heading in this article. You may want to incorporate all the step-by-step
changes into a single directory, or you may copy each sample to a new directory
to keep the original version for reference. If you choose to copy the sample, you
will need to change the references to certain files to point to the new directory,
because Visual C++® will not always update them correctly.

The project uses MFC, but only to implement a fairly reasonable application and
user interface framework. All COM-related features are implemented in straight
C++ without MFC, and they can be passed to any other framework or to an
application implemented directly to Win32®. The (trivial) implementation of the
object's functionality also takes advantage of MFC's array and string management
features, but this can easily be replaced.

The project is written using the Microsoft Visual C++ 2.2 development
environment and provides projects for Intel®-based Windows NT® or Windows
95 platforms. For the last sample, which illustrates the use of remoted custom
interfaces, you will need the MIDL compiler that comes with the Platform SDK.

I assume that you have a working knowledge of the C++ language, especially the
concepts of virtual functions and abstract base classes. To follow my explanations
on exports and related topics, knowledge of Windows DLLs is advantageous,
although not essential for understanding the structure of COM.

DB: C++ Object Used in a C++ Client

This sample consists of two parts:

 An object that implements features that are good candidates for reuse in

other projects

 A client application that uses part of the functionality in the object by

means of a rudimentary user interface

The "Database" Object

The DB sample application implements a simple, database-like object that


manages all its data in memory. Each "database" contains an array of pointers to
the tables inside the database. The tables are arrays of strings, and each array
element represents one row in the table. A parallel string array to the pointer
array contains the "names" of the tables.
Database

string arrays
abc
123
...

a pointer array a string array


dog table 0 table 1's name
cat table 1 table 2's name
... ... ...
table 2 table n's name

abc

...

Queries are extremely simple: just indicate a table number (a zero-based index
to the array of table pointers) and the row number (a zero-based index into the
nth string array). Thus, the Read function looks like this:

HRESULT Read(short nTable, short nRow, LPTSTR lpszData);


The application passes in a preallocated buffer of sufficient size, and the object
copies the string data to this buffer.

The Write function looks very similar:

HRESULT Write(short nTable, short nRow, LPCTSTR lpszData);


Additional functions let you create tables (giving each table a name), remove
tables, and obtain information about the database:

typedef long HRESULT;


class CDB {
// Interfaces
public:
// Interface for data access
HRESULT Read(short nTable, short nRow, LPTSTR lpszData);
HRESULT Write(short nTable, short nRow, LPCTSTR lpszData);

// Interface for database management


HRESULT Create(short &nTable, LPCTSTR lpszName);
HRESULT Delete(short nTable);

// Interface for database information


HRESULT GetNumTables(short &nNumTables);
HRESULT GetTableName(short nTable, LPTSTR lpszName);
HRESULT GetNumRows(short nTable, short &nRows);

// Implementation
private:
CPtrArray m_arrTables;// Array of pointers to CStringArray (the
"database")
CStringArray m_arrNames; // Array of table names
public:
~CDB();
};

CDB
-CPtrArray m_arrTables
-CStringArray m_arrNames
+HRESULT Read(short nTable, short nRow, LPTSTR lpszData)
+HRESULT Write(short nTable, short nRow, LPCTSTR lpszData)
+HRESULT Create(short &nTable, LPCTSTR lpszName)
+HRESULT Delete(short nTable)
+HRESULT GetNumTables(short &nNumTables)
+HRESULT GetTableName(short nTable, LPTSTR lpszName)
+HRESULT GetNumRows(short nTable, short &nRows)
~CDB()

All functions return the same kind of error/success code, which, in general, is a
good design for a class, because it facilitates error reporting even on simple
functions such as CDB::GetNumTables, where one is tempted to simply return
the "short" value. If there was something wrong with the database, the function
can return something more intelligent than "0" or "-1".

The "Database" Client

The client is an MFC/AppWizard-generated Windows-based application. Each


document creates a new CDB object in its OnNewDocument function and stores
a pointer to the object in CDB::m_pDB.

The document class implements four menu items:

 Create

 Write

 Read
 Read multiple

Create adds a new table called "Testing" to the database associated with the
document and saves the number of the last created table in a member variable
(CDBDoc::m_nTable) of CDBDoc.

Write writes a generated string into row 0 of the last created table.

Read reads whatever is stored in row 0 of the last created table and stores it in
CDBDoc::m_csData, which the view class CDBView displays in the client area.

Read multiple performs as many reads as indicated in the DB.INI file


"ReadMultiple" section, in the entry "NumCalls," and measures the time this took
using the system-provided tick count.

The Reuse Mechanism

The DB sample uses standard C++ source code reuse, for which the client needs
the complete source code for the object and compiles it into the project. The
project uses three directories: \CLIENT, \INTERFACE, and \OBJECT. The \
INTERFACE directory contains the header file DBSRV.H, which declares the object
(see above). This file is used by both the client and the object. The client directly
includes DBSRV.CPP, which is the only file in the \OBJECT directory.

DB_CPPDLL: C++ Object in a DLL, Used by a C++ Client

One severe limitation of the standard C++ reuse model is that if I wanted to sell
my fancy database object, I would have to distribute my complete and valuable
know-how in form of the source code.

A standard solution to this problem under Windows is to package the


implementation into a DLL and provide one or more header files that declare the
functions and structures used. (Microsoft currently ships a large part of Windows
and Windows NT in this form; in fact, the headers are shipped separately from
the DLLs in the form of software development kits.)

To package the implementation into a DLL, you must consider the following
issues:

 Exporting of member functions

 Memory allocation only in the DLL or only in the EXE

 Unicode™/ASCII interoperability

Exporting Member Functions


One simple way to export functions is by using the __declspec(dllexport)
modifier (since Visual C++ 2.0), which can be applied to any function, including
member functions. This instructs the compiler to place an entry into the exports
table, just like declaring an export in the module definition file (.DEF) for the
linker. [In the 16-bit world, _export did the same thing; in addition, the compiler
provided additional code to change to the data segment of the DLL before
entering the function, and then switch back to the caller's data segment before
leaving the function (prologue/epilogue).]

For C++ this is the only practical way to export big numbers of functions,
because C++ provides function overloading (that is, using one function name for
many functions that differ only in the kind of parameter declared). Thus, the C++
compiler combines all the information it has about a member function (return
type, class, parameter types, and public/private) into one big name. (See the
technical article “Exporting with Class" in the MSDN Library Archive for more
details.)

By simply applying the _declspec(dllexport) modifier to all the functions in the


CDB class, we make the class exportable in a DLL. We then just have to provide a
make file to create the binary.

Due to the name mangling, it is very difficult for the client to use dynamic
loading: We would have to pass all the decorated names to GetProcAddress and
save the returned pointer somewhere. Then we would have to set up a simulation
class that calls each of these functions. Therefore, it's definitely better to use
implicit linking (using the DB.LIB file generated by the linker).

Another issue, related to name mangling, is incompatibility between compilers.


The name mangling is not standardized, and thus each compiler decorates
functions differently. A DLL compiled by one compiler cannot be used by another.
If you did not want to give away your source code, you would have to provide all
the compiled versions yourself. Using this technique in a component software
scenario is simply not acceptable. There would have to be many objects with the
same functionality to satisfy all possible clients.

Memory Allocation

Both the DLL and the executable file (EXE) maintain their own lists of allocated
memory blocks, which they maintain through the malloc memory allocator. The
C++ new and delete functions also rely on these lists of memory blocks, so that
C++ tends to use dynamic memory allocation more often than C. If the DLL
allocates some memory—for example, for the creation of a new instance of a
class—this memory is marked in the allocation list of the DLL. If the EXE tries to
free this memory, the run-time library looks through its list of allocated memory
blocks and fails (usually with a GP fault). Thus, even if the memory between the
DLL and the EXE is completely shared, the logic for managing allocation breaks if
two modules mix their allocation schemes.

There are basically three solutions to this problem:

 Have the EXE always allocate and free a given kind of memory.

 Have the DLL always allocate and free a given kind of memory.

 Have something neutral (the operating system) allocate and free all the

memory.

The third approach seems to be the most flexible one, but unfortunately cannot
be used easily for operations involving the C/C++ run-time libraries, even for
basic functions such as new and delete.

From an object-based point of view, it is more convenient to have the memory


allocation done by the object (encapsulation). This way the client does not need
to be aware of the size of the object. If the object's implementation changes,
chances are that the client will still be able to use the object (if the object's
exported functions do not change).

In this sample, we will use a global function to have the object instantiate a copy
of itself (using new) and return the pointer to the client (later, we will see that
COM takes the same approach). The client will then use this pointer in all its calls
to the object's member functions (as a hidden first parameter). The client obtains
the address of the functions (the same for all instances of the object) through
implicit linking of the DLL.

From a global perspective, there can be many things involved in creating an


object. For example, you probably need to provide some password or security
information before you are allowed to create an object. Thus, it can be convenient
to have an additional object that handles instantiation of the actual object instead
of having the client create the object directly.

Our global instantiation function will return an object of another class that will
allow us to produce instances of the CDB class. This class will be called
CDBSrvFactory, and it will have only one member function:
CDBSrvFactory::CreateDB(CDB** ppDB). This function creates the object and
returns a pointer to it through the parameter ppDB (basically, *ppDB=new
CDB): It is a "factory" that produces object instances of a given class. We will
also call this object a class factory object.

Unicode/ASCII Interoperability

All samples associated with the DB project can be built in both Unicode and
ASCII. Some of the functions take parameters that are strings. These change
their binary representation when being compiled for Unicode rather than for
ASCII (see the Platform SDK for details on Unicode). This works as long as both
the client and the object are compiled within the same project—they will always
match. If you compile them separately, as we will do here, you can take any of
the following approaches:

 Provide two versions of your object.

 Standardize all function parameters to be ASCII, and convert inside the

client and/or the server if they compile for Unicode.

 Standardize all function parameters to be Unicode, and convert inside the

client and/or the server if they compile for ASCII.

Again, providing two versions can be very expensive: You double the size of your
object if you want to be available for both kinds of clients. For a global
component management system like COM, this is definitely not a good idea.

If you want to standardize on one of the two without losing functionality, the
choice is obviously Unicode, because it is a superset of ASCII.

These samples (and COM) standardize on Unicode for any parameters to any
interface that is to be seen by another object. The cost is minimal: You will need
to convert your strings to and from ASCII to Unicode before calling, or when
receiving a parameter. If you compile for Unicode, there is no performance
penalty.

Changes: Step by Step

First we will work on the object:

1. Use AppWizard to create a new Dynamic-Link Library project called

Object\DB.MAK (not an MFC AppWizard DLL, which creates an MFC

Extension DLL). Add Object\DBSrv.cpp to the project and copy Client\

stdafx.h to Object\stdafx.h. Create new targets for Unicode (Win32

Unicode Debug and Win32 Unicode Release; in Project Settings include a

preprocessor symbol UNICODE). Through stdafx.h, set precompiled

headers for all targets.

Now we will change the interface that the client sees (Interface\DBSrv.h).

Export Member Functions


2. Export all the interface member functions of CDB in Interface\DBSrv.h.

Use __declspec(dllexport) to instruct the compiler to export these

functions.

Memory Allocation

3. Add the ULONG CDB::Release() function that will delete the object when

it is no longer needed (in Interface\DBSrv.h).

4. Declare the class factory object CDBSrvFactory in Interface\DBSrv.h.

5. Declare the function that will return the factory object and export it (in

Interface\DBSrv.h).

The resulting header file up to this point should look as follows (new parts

in bold):

#define DEF_EXPORT _declspec(dllexport) // Step 2

class CDB {
// Interfaces
public:
// Interface for data access
HRESULT DEF_EXPORT Read(short nTable, short nRow, LPTSTR lpszData);
HRESULT DEF_EXPORT Write(short nTable, short nRow, LPCTSTR lpszData);
(. . .)

// Step 3

ULONG DEF_EXPORT Release(); // Need to free an object from within the


DLL.

// Implementation
(. . .)
};

class CDBSrvFactory { // Step 4

// Interface

public:
HRESULT DEF_EXPORT CreateDB(CDB** ppObject);
ULONG DEF_EXPORT Release();
};

HRESULT DEF_EXPORT DllGetClassFactoryObject(CDBSrvFactory **


ppObject); //Step 5
Now we will change the implementation of the object (Object\DBSrv.cpp)

6. Implement CDB::Release() in DBSrv.cpp

ULONG CDB::Release() {
delete this; // Cannot access data member after this!!!
return 0;
}

7. Implement CDBSrvFactory in a new file called Object\DBSrvFactory.cpp

(don't forget to add it to the project) . Also implement the function that

returns the CDBSrvFactory Object:

#include "stdafx.h"
#include "..\interface\bdsrv.h"
// Create a new database object and return a pointer to it.
HRESULT CDBSrvFactory::CreateDB(CDB** ppObject) {
*ppObject=new CDB;
return NO_ERROR;
}
ULONG CDBSrvFactory::Release() {
delete this;
return 0;
}
HRESULT DEF_EXPORT DllGetClassFactoryObject(CDBSrvFactory **
ppObject) {
*ppObject=new CDBSrvFactory;
return NO_ERROR;
}
Now we will make the necessary changes to the client:

8. Change CDBDoc::~CDBDoc to call m_pDB->Release() instead of

delete m_pDB.

CDBDoc::~CDBDoc() {
if (m_pDB) {
m_pDB->Release();
m_pDB=NULL;
}
}
9. Obtain CDB* through the class factory object. // Create a database object

through the exported function and class factory

// object. 使用类工厂的作用是 DB 对象的申请和释放都是在 dll 中进行的


CDBSrvFactory *pDBFactory=NULL;

DllGetClassFactoryObject(&pDBFactory);
pDBFactory->CreateDB(&m_pDB);

pDBFactory->Release(); // We no longer need the factory.

10. Modify the project: Remove Object\DBsrv.cpp and include Object\xxx\

db.lib instead in Projects Settings/Linker. (For each target use the

appropriate directory, for example, \Windebug.)

Standardize on Unicode

11. Modify the object again: Change all parameters in CDB that have the

string LPxTSTR to LPxWSTR. The T version is the portable one, which

compiles to ASCII or Unicode, depending on the preprocessor symbol

UNICODE. The W version always uses wide characters (short instead of

char).

12. Put in conditional statements (#ifdef UNICODE) to convert incoming

parameters to ASCII, if you are not compiling for UNICODE

(Write/Create: Use the Win32 API MultiByteToWideChar), and convert

outgoing parameters to Unicode (Read/GetTableName: Use

WideCharToMultiByte).

13. In the client, convert outgoing parameters to Unicode (#ifdef UNICODE)—

Create: Use L"xxx" to declare a Unicode string; Write: use

MultiByteToWideChar)—and incoming parameters to ASCII (Read: Use

WideCharToMultiByte).

void CDBDoc::OnDatabaseCreate()
{
m_pDB->Create(m_nTable, L"Testing");
m_nCount=0; // Set number of writes to 0
}

void CDBDoc::OnDatabaseWrite()
{
m_nCount++;
CString csText;
csText.Format(_T("Test data #%d in table %d, row 0!"), m_nCount,
(int) m_nTable);

#ifdef UNICODE
m_pDB->Write(m_nTable, 0, csText);
#else
WCHAR szuText[80]; // Special treatment for ASCII client
MultiByteToWideChar(CP_ACP, MB_PRECOMPOSED, csText, -1, szuText,
sizeof(szuText));
m_pDB->Write(m_nTable, 0, szuText);
#endif

void CDBDoc::OnDatabaseRead()
{

#ifdef UNICODE
m_pDB->Read(m_nTable, 0, m_csData.GetBuffer(80));
#else
WCHAR szuData[80];
m_pDB->Read(m_nTable, 0, szuData);
WideCharToMultiByte(CP_ACP, 0, szuData, -1, m_csData.GetBuffer(80),
NULL, NULL);
#endif

m_csData.ReleaseBuffer();
UpdateAllViews(NULL);
}

14. Compile the object before compiling the client, because the client needs

DB.LIB in order to link.

15. Be sure to either copy the appropriate DB.DLL to the client's directory or

include the directory of the DLL in the path. Any client should be able to

work with any server (except, perhaps, for debug versions, since name

mangling can be different).

Don't miss the opportunity to run DumpBin.EXE /exports db.dll and see all

the cryptic names that the compiler generates for the member function!
DB_VTBL: C++ Object Exposing "Pure" Abstract Base Class

Problem: C++ Does Not Encapsulate Data Members

One problem with the bare C++ approach is that part of the implementation
details of an object still have to be included in the declaration of the class: The
client "sees" all the private member (arrays and so forth) even though the
compiler will not allow a derived class to access them.

For the monolithic case, where everything is compiled from the complete source
code, this approach is probably still acceptable, although not necessarily
desirable.

For binary (public) distribution of an object, revealing the implementation details


in a header file is probably still too much. The client (specifically, the compiler,
when compiling the client code) needs to know, at a minimum, the size of the
data members when doing memory allocation for instances of the object, even if
it never accesses any of the data members. When the object implementer
changes the size of the data members, all clients must recompile. Our approach
of having the object do the allocation lets you get away with not recompiling;
because all the client code ever needs from the object are the addresses of its
member functions. But I would definitely not feel comfortable about having a
mission-critical system work with an internal discrepancy such as this.

As stated above, the only information the client needs from the object is the
address of the member functions. Therefore, we could set up a table of pointers
to all the (exported) member functions of the object. Then, when invoking the
function, the client would simply look up the nth member function's address and
jump to it.

This approach sounds a little complicated to do in C++: We would have to set up


a "wrapper class" to replicate the functions within the client code and, within each
function, look up and call indirectly the corresponding member function in the
object. Next, it would be convenient to have a tool that generates these trivial but
tedious-to-implement wrapper functions.

C++ Solution: Abstract Base Class

Moving back from the binary world of member function addresses, let's get back
to the higher levels of C++. There is a feature in C++ that provides a similar
functionality: the abstract base class.

Common Syntax in Derived Classes

The term abstract base class sounds formidable, but actually it is nothing but a
class that declares the syntax of one or more of its functions without
implementing them. This class cannot be instantiated—it merely serves as a base
class for other classes that fill in the missing "link." To be a "real" C++ class that
can be instantiated, the deriving class must implement all the "abstract" functions
with the exact same syntax.

Calls to Operations in Derived Classes from Abstract Operations in the


Base Class

There is another advantage to abstract base classes, besides letting you force
derived classes to implement an exact function signature: They also let you use
the unimplemented function from code in the base class and have C++ choose
the actual code at run time.

You can provide operations in the base class that use other operations (the
missing functions) to provide higher-level functionality—essentially, an abstract
algorithm with replaceable elementary operations. You could implement a sorting
algorithm as an abstract base class, one that uses comparisons and swaps as
elementary operations. A derived class can then fill in these operations and the
same algorithm can sort completely different objects.

If two classes derive from the same abstract base class, they can share the same
abstract operations defined in the base class, and yet they can provide very
different functionality by providing different elementary operations. The code in
the base class calls functions depending on the class of the actual instance of the
object it operates on.

Common Interface to All Derived Classes

A side effect of this architecture is that you can cast a pointer to an instance of a
class derived from an abstract base class back to the base class, and then you
can use this pointer to invoke functions in the derived class.

If multiple classes derive from one abstract base class, a client can invoke the
same function on two different objects and C++ will find—at run time—the
correct function in the correct derived class.

Implementation Secret: Virtual Functions

How does the compiler accomplish this magic? All the functions in an abstract
base class must be virtual functions: You can call a virtual function on a pointer to
a base class, and the compiler will call the correct implementation depending on
the actual object referred to by the pointer.

How does the compiler implement virtual functions? The compiler builds a table
for each class that contains pointers to all the virtual functions declared in the
class, whether they are declared (and possibly implemented) in a base class or
declared in the derived class. The table is filled in "order of declaration"; that is, if
there is no base class with virtual functions, the first declared member function in
the derived class (source code order) corresponds to the first entry in the virtual
function table for this class. If there are virtual functions declared in base classes,
they go before the newly defined virtual functions. The virtual function table is
also called the vtable of the class.

Note the following:

 The table is built on a per-class basis, not for each instance of the class.

 The memory image of an instance of a class has one pointer to the vtable

of this class, stored in front of the actual data members of the object.

Thus an object that uses virtual functions occupies 4 additional bytes for

each instance. The vtable itself occupies 4 bytes for each virtual function,

but there is only one vtable per class and it is shared by all instances.

When you invoke a virtual function from a base class, the C++ compiler

has a pointer to the object's instance data (the "this" pointer). It obtains

the pointer to the vtable of the class from the first 4 bytes of the object's

instance data. Each virtual function has a unique index in the vtable of a

given class. The compiler simply obtains the function address from the

vtable at the function's index and branches to this address. When you

invoke calls through a pointer to the base class, the technique is the same.

 Non-virtual functions do not appear in the vtable.

Figure 1. Memory layout of an instance of an object with virtual classes.


[Reprinted from "Object Mapping in C++," by Tom Germond, Jan Gray, Dale E.
Rogerson. MSDN Library Archive, Technical Articles, C/C++ Articles.]

Using "Pure" Abstract Base Classes as Function Table Wrappers

When I described the problem of changing implementations of objects and I


stated that the client needs only the addresses of the object's member functions,
I proposed a tool to generate wrappers for tables with the exported functions of
an object.
Well, here it is: your C++ compiler. As we saw above, virtual functions are
implemented exactly that way: The compiler sets up a table with the addresses of
the function (vtable) and lets you wrap these functions with an abstract base
class that simply maps calls to members through the table. As you will see, we
will use abstract base classes in our samples that provide no actual code—all the
members will be pure virtual functions, and all they will do is let you invoke
functions in any derived class, once you have cast a pointer to the base class. For
the purposes of this article, I call this kind of abstract base class a pure abstract
base class; COM calls it an interface.

Let's use this idea, to begin with, in a subdirectory of the DB sample that has
everything compiled into one executable file. Even in this situation, using abstract
base classes can be advantageous. You can make sure that you will not
accidentally use implementation data that for technical reasons you had to make
"public" for C++.

Changes: Step by Step

Use the DB_CPPDLL sample as the basis. The next sample will be very similar to
this one, so if you are short of time, you can skip this one. Its primary purpose is
to illustrate that using abstract base classes as function table wrappers is a
standard C++ feature, which can also be very useful in a non-component world.

1. Copy the header file (interface\dbsrv.h) to \object\dbsrvimp.h. The first

file will contain the abstract base class, which is the only code the client

needs. The second file will be the header for the actual implementation.

2. Make all functions in interface\dbsrv.h pure virtual functions and remove

the data members. Change the name of the class from CDB to IDB

(Interface to DB).

3. Derive CDB in object\dbsrvimp.h from IDB and include ...\interface\

bdsrv.h. (Don't forget to change the #ifndef at the beginning of the file to

something like _DBSRVIMP_INCLUDE.)

4. Because the client can not instantiate an abstract base class, we will have

to have the object instantiate itself and return a pointer to itself, cast to

IDB*: Declare and define a CreateDB() function in dbsrv.h/cpp that does

this.

The resulting header file Interface\dbsrv.h:

class IDB {
// Interfaces
public:
// Interface for data access
virtual HRESULT Read(short nTable, short nRow, LPTSTR lpszData)
=0;
virtual HRESULT Write(short nTable, short nRow, LPCTSTR
lpszData) =0;
(...)
};

HRESULT CreateDB(IDB** ppObj);


The new header "Object\dbsrvimp.h":

#include "..\Interface\dbsrv.h"
typedef long HRESULT;
class CDB : public IDB {
// Interfaces
public:
// Interface for data access
HRESULT Read(short nTable, short nRow, LPTSTR lpszData);
(...)
};

5. Make dbsrv.cpp include dbsrvimp.h instead of dbsrv.h.

The resulting DBSRV.CPP (no changes, just adding the instantiation

function):

(...)

HRESULT CreateDB(IDB** ppObj) {


*ppObj=(IDB*) new CDB; // Cast to abstract base class.
return NO_ERROR;
}

6. Now we will adjust the client: Change CDB::m_pDB to IDB* and use

CreateDB instead of new to instantiate it.

Everything works just as before. If we look at the binary level, there is a slight
overhead cost for the indirect function call (get vtable pointer from instance data,
get function pointer from vtable, jump), but the performance measured with
"Read Multiple" is not significantly affected. The actual "work" performed in the
member functions takes orders of magnitude longer than these two additional
memory accesses.
DB_VTBLDLL: C++ Object in a DLL Exposing "Pure" Abstract Base Class

It is even more beneficial to apply the technique of abstract base classes to the
scenario of binary packaging in DLLs. In the previous DLL sample, we had to
export each function of the object, and we depended on the name-mangling
scheme of the compiler. There is no standard for name-mangling, so you have to
provide objects for each compiler that a potential client might want to use for
their implementation.

Remember that pure abstract base classes provide little more than a table of
function entry points, and the entry points are allocated by the compiler of the
object. They are initialized at load time and not freed until the module is
unloaded. If we create object instances indirectly, memory management is not a
problem. Code generated by a client's compiler can freely access all these tables
through the vtable pointer, which is always located in the first 4 bytes of the
memory image of the object. All C++ compilers use the same memory layout for
their implementations of vtables, so vtables are actually a de facto binary
standard in the C++ world.

Thus, this technique lets you export C++ classes without exporting any function
name. (16-bit Windows still requires the _export tag or compilation with a
special compiler option, because we need prologue/epilogue code to switch
between different data segments of DLL and EXE.) All that the client and the
object have to agree on is the layout of the vtable—that is, the order of the
functions in the vtable, the number and kind of parameters for each function, and
the way the parameters are passed (in most cases on the stack: calling
convention!).

The layout of the vtable and the function parameters is completely defined by an
abstract base class, so that a C++ header file can serve as a complete description
of the syntax of an interface contract.

Changes: Step by Step

This sample is based on the first DLL subdirectory, DB_CPPDLL. The changes are
almost identical to the ones for DB_VTBL, except that we already implemented
the indirect instantiation of CDB for other reasons (memory management!).

1. Copy the interface\dbsrv.h header file to object\dbsrvimp.h. The first file

will contain the abstract base class, which is the only code the client

needs. The second file will be the header for the actual implementation.

2. Make all functions in interface\dbsrv.h (both CDB and CDBSrvFactory)

pure virtual functions and remove the data members. Remove the export
on all member functions. Change the name of the class from CDB to IDB

(Interface to DB), and CDBSrvFactory to IDBSrvFactory.

3. Derive CDB in object\dbsrvimp.h from IDB and include ...\interface\

bdsrv.h. (Don't forget to change the #ifndef at the beginning of the file to

something like _DBSRVIMP_INCLUDE.) Derive CDBSrvFactory from

IDBSrvFactory.

4. Change the parameter of IDBSrvFactory::CreateDB to IDB** instead

of CDB**. Change the parameter of DllGetClassFactoryObject to

IDBSrvFactory**.

The following code illustrates the resulting Interface\dbsrv.h header file:

class IDB {
// Interfaces
public:
// Interface for data access.
virtual HRESULT Read(short nTable, short nRow, LPWSTR lpszData)
=0;
(...)
};
class IDBSrvFactory {
// Interface
public:
virtual HRESULT CreateDB(IDB** ppObject) =0;
virtual ULONG Release() =0;
};
HRESULT DEF_EXPORT DllGetClassFactoryObject(IDBSrvFactory **
ppObject);
The new Object\dbsrvimp.h header:

#include "..\Interface\dbsrv.h"

class CDB : public IDB {


// Interfaces
public:
// Interface for data access.
HRESULT Read(short nTable, short nRow, LPWSTR lpszData);
};
class CDBSrvFactory : public IDBSrvFactory {
// Interface
public:
HRESULT CreateDB(IDB** ppObject);
ULONG Release();
};

5. Make object\dbsrv.cpp and object\dbsrvfact.cpp include dbsrvimp.h

instead of dbsrv.h.

The resulting DBSrvFact.CPP (no changes, just adjusting the instantiation

function):

HRESULT CDBSrvFactory::CreateDB(IDB** ppvDBObject) {


*ppvDBObject=(IDB*) new CDB;
return NO_ERROR;
}
(...)
HRESULT DEF_EXPORT DllGetClassFactoryObject(IDBSrvFactory **
ppObject) {
*ppObject=(IDBSrvFactory*) new CDBSrvFactory;
return NO_ERROR;
}

6. Now we will adjust the client: Change CDB::m_pDB to IDB*. In

CDBDoc::OnNewDocument, change CDBSrvFactory* to

IDBSrvFactory*.

Everything works just as before. The performance is identical to that of the prior
version of the code where everything is compiled into one executable! The
compiler does the same indirect function call through the vtable.

Run DumpBin.EXE /exports db.dll and you will see that the only exported entry
point is DllGetClassFactoryObject!

All the rest of the dangerous, mangled names have disappeared. All member
functions are now cleanly accessed through vtables. We could even dare to load
the DLL dynamically (through LoadLibrary/GetProcAddress), because there is
only one exported entry point.

DBAlmostCOM: C++ Object in a DLL, Loaded Through COM

The previous sample provided a very flexible mechanism for object packaging:
Only one entry point had to be declared in the DLL, and all other interchange was
done through pointers to pure abstract base classes.

Next, we will make some minor changes that will make this object almost COM-
compliant.
Note   This sample will not work on Windows 2000 or higher, due to the fact that
the object is not following all the COM rules.

Single Standard Entry Point for Object Creation

Suppose you want to pack multiple objects into a single DLL. Basically, you have
two options:

 Provide one entry point function for each class you want to export.

 Pass an additional parameter to a standard entry point, indicating which

class you want.

The second approach seems to be more appropriate for a generic component


model, because it provides one central entry point for creating all the objects you
want. COM does exactly this, and requires all objects implemented in DLLs (also
called in-process servers) to export the entry point through the
DllGetClassObject function.

This function receives a parameter of type CLSID that specifies which class the
caller wants to access. The CLSID is nothing but a relatively big number (16 bytes
—yes, bytes, not bits—in case many people want to create objects).

The DLL checks the number of the class that is requested, and if it provides that
class, it returns a pointer to an object that implements the creation method for
the actual object. This is the architecture we used already for both of our DLL
samples. Introducing this intermediate object allows for flexible instantiation of
objects. There is another technical reason for this approach that is related to
objects implemented in executables, which we will discuss later.

Microsoft provides a tool for generating CLSIDs that are guaranteed to be globally
unique. This is done (a little simplified) by using the worldwide unique ID of the
network card, combined with the current date and time. This combination fits into
16 bytes, plus there is plenty of space left over for a simple counter that can give
you an (almost) arbitrarily large range of contiguous CLSIDs. Because these
numbers are used for many things — not just as class identifiers—they are called
globally unique identifiers (GUIDs).

Here is our standard entry-point function in completed form:

STDAPI DllGetClassObject(REFCLSID rclsid, REFIID riid, LPVOID * ppObject);


STDAPI is just a macro that is replaced (by the compiler) with HRESULT and
some calling convention.

We find our CLSID (here declared as a reference to a CLSID), and our "pointer to
a pointer" that receives the pointer to the object's instance. Note that this pointer
is declared as "void," because we will return arbitrary base class pointers through
this parameter.
What about the additional parameter, riid, that looks like another of those GUIDs?
IID stands for interface ID. With the riid parameter, the caller indicates what kind
of pointer it expects to be returned through ppObject. The object returned is the
class factory object, and the interface ID assures the caller that it "understands"
the pointer. The object will check that the caller actually requests our interface
IDBSrvFactory, for which we will designate a unique IID.

Standard Object Creation API

Once this standard entry point is exported, the COM libraries can manage our DLL
as an object, even though it is not yet a real COM object. We can register our
object—its CLSID and the path to the DLL—and have COM worry about loading
the DLL. Before we used implicit DLL linking in our client—the linker put in some
code to load the specific DLL we had compiled with, and the name of the DLL was
hard-coded into the client's executable. Now we will just tell the COM libraries the
CLSID of the object that we want, and COM will find and load the DLL for us.

The API that COM provides for this purpose is CoGetClassObject:

HRESULT CoGetClassObject(
REFCLSID rclsid,
DWORD dwClsContext, LPVOID pvReserved REFIID riid, LPVOID * ppv);
The first parameter is the CLSID of the object that we want to load. COM looks
for it in the registry under HKEY_CLASSES_ROOD\CLSID\{xxx}.

The second and third parameters give more information about the activation
context of our object. For now we will just ask for any server (CLSCTX_SERVER),
and pass NULL for lpReserved.

The next parameter is the IID of the initial abstract base class pointer that we
want to retrieve from the object, and the last parameter is our pointer to a
pointer that will receive the pointer to the object.

But how does CoGetClassObject find the DLL that implements this CLSID, if the
CLSID is just a number (admittedly, a large one) with no encoded information?

Standard Object Registration

All the object-related information that the COM libraries need is concentrated
under one entry in the registry: HKEY_CLASSES_ROOT\CLSID. Each object has a
key that is named with the string representation of its CLSID. Under this key, the
COM libraries look for the information they need in order to create an object.

For now, we just need one of these entries: InprocServer32. This entry
indicates the path to a DLL with the standard entry-point mechanism. Thus, given
the CLSID, the COM library can look under HKEY_CLASSES_ROOT\CLSID, find the
appropriate key for the object, look for the sub-key InprocServer32, and obtain
the name of the DLL.

CoGetClassObject then loads the DLL (using LoadLibrary) , obtains the entry
point for the DllGetClassObject function (using the Win32 GetProcAddress
function), calls DllGetClassObject with the parameters the caller provided, and
passes the pointer to the class object back to the client.

From there on, everything is just between the object and the client: The pointer
returned is an indirect pointer to the vtable of the class factory implemented by
the object. [Theoretically the pointer does not even have to be a pointer to a
vtable, because it is typed as void. You could simply return a pointer to a
standard C++ object (if you don’t need to access functions), any other type of
function lookup table, or whatever you wanted. It does not make sense to do so,
but I want to make it clear that COM (in the in-process case) does no
interpretation whatsoever on this returned pointer.] Any call on this pointer is a
normal C++ virtual function call. COM does not interfere in this process; it just
provides the service of locating and loading the object. The performance of this
sample and the pure vtable-based sample are exactly identical.

Changes: Step by Step

1. Generate two GUIDs using GUIDGEN.EXE: Choose the third option, "struct

. . . GUID", and copy and paste the new GUIDs one by one to interface\

bdsrv.h—they are part of the contract between the object and the client

(see the Note below). Name them CLSID_DBSAMPLE and

IID_IDBSrvFactory, respectively. Leave just the declaration part in

"dbsrv.h" ("extern") and put the definition in "object\dbsrvfact.cpp". (Later

we will see two other methods for managing GUIDs within source files:

one with a macro DECLARE_GUID and the other when we use the

Interface Definition Language [IDL] to describe interfaces.)

Note: If you want to be prepared for future needs of GUIDs, and want to

have them in consecutive order, take time now to generate some 10 or 20

—using the “New GUID” button—and pass them to a separate file

somewhere (perhaps a Microsoft Excel spreadsheet?). If you need a lot of

GUIDs, you might also want to look at the command-line utility

uuidgen.exe, which allows automatic generation of multiple GUIDs (/n

parameter). The advantage of having your GUIDs together range from


“aesthetic” (all registry keys appear together) to more relevant

performance issues (COM needs to do a lookup of GUIDs quite often and

works a little more efficiently on clusters of GUIDs than on widely

separated GUIDs.

The following code shows the definition of the GUIDs in object\

bdsrvfact.cpp (they look dangerous, but are nothing but really big

numbers):

// {30DF3430-0266-11cf-BAA6-00AA003E0EED}
static const GUID CLSID_DBSAMPLE =
{ 0x30df3430, 0x266, 0x11cf, { 0xba, 0xa6, 0x0, 0xaa, 0x0, 0x3e, 0xe,
0xed } };
// {30DF3431-0266-11cf-BAA6-00AA003E0EED}
static const GUID IID_IDBSrvFactory =
{ 0x30df3431, 0x266, 0x11cf, { 0xba, 0xa6, 0x0, 0xaa, 0x0, 0x3e, 0xe,
0xed } };

2. Remove the declaration of DllGetClassFactoryObject in dbsrv.h (already

declared in OLE2.H), and change the implementation in dbsrvfact.cpp to

the following:

STDAPI DllGetClassObject(REFCLSID rclsid, REFIID riid, void**


ppObject)
Validate the two parameters as CLSID_DBSAMPLE and IID_IDBSrvFactory,

respectively. If one of them does not have the expected value, return

CLASS_E_CLASSNOTAVAILABLE or E_INVALIDARG, as follows:

STDAPI DllGetClassObject(REFCLSID rclsid, REFIID riid, void**


ppObject) {
if (rclsid!=CLSID_DBSAMPLE) { // Is this the number of our class?
return CLASS_E_CLASSNOTAVAILABLE;
}
if (riid!=IID_IDBSrvFactory) { // Is this the number of our
interface?
return E_INVALIDARG;
}
*ppObject=(IDBSrvFactory*) new CDBSrvFactory;
return NO_ERROR;
}
3. Include ole2.h as follows: In stdafx.h append "#include <ole2.h>". Add

"#define _AFX_NO_BSTR_SUPPORT" before the other includes, because

the MFC header files define some symbols differently than OLE2.H does.

(Be careful not to include afxole.h, because doing so provides an

ASCII/Unicode mapping layer, which can get into your way when using

some COM interfaces or APIs. This mapping layer was removed in MFC

4.0.)

4. Create a module definition file, DB.DEF, and export DllGetClassObject.

(You cannot use _declspec(dllexport) because the function is already

declared in ole2.h with different modifiers.)

5. For the client, in client\dbdoc.cpp, call COM instead of loading our DLL

directly: Call CoGetClassObject instead of DllGetClassObject. Validate

the result using the FAILED-macro provided by OLE2.H: HRESULT hRes;

hRes=CoGetClassObject(CLSID_DBSAMPLE, CLSCTX_SERVER, NULL,


IID_IDBSrvFactory, (void**) &pDBFactory);
if (FAILED(hRes)) {
CString csError;
csError.Format(_T("Error %d creating DB Object!"), hRes);
AfxMessageBox(csError);
return FALSE;
}

6. Call CoInitialize(NULL) in CDBApp::InitInstance to initialize the COM

libraries; and call CoUninitialize() in CDBApp::ExitInstance.

7. Add the definitions of CLSID_DBSAMPLE and IID_IDBSrvFactory to the

beginning of dbdoc.cpp. Instead of hard-coding the name of the DLL, we

now hard-code the CLSID of the object that we want to use!

8. Add ole32.lib to the project and remove db.lib from the project (in

Project/Setting/Linker).

9. Change stdafx.h to include both ole2.h and

#define _AFX_NO_BSTR_SUPPORT.

10. Register the DLL in register HKEY_CLASSES_ROOT\CLSID\{<<your

clsid>>}\InprocServer32={your path}db.dll. (If your DLL is in your

system path or in the same directory as the client, you do not need to
specify the complete path to the DLL.). All GUIDs in the registry appear in

ASCII form within brackets. You can get your GUID's string representation

from the definition provided by GUIDGEN (should be in your header file).

a. Run Regedt32.exe and open HKEY_CLASSES_ROOT\CLSID.

b. Add a Key with the name of your CLSID:

{30DF3430-0266-11cf-BAA6-00AA003E0EED}

c. To this new Key add an unnamed Value, with Data Type = REG_SZ and

String = DB Sample Object. This string is not used by COM but can help

you to find your CLSID later.

d. Add a Key named InprocServer32.

e. Add an unnamed Value, with Data Type = REG_SZ and String =

<path>\db.dll, replacing <path> with the actual path of your DLL.

Optionally you can just register db.dll and add its path to the system or

user path.

Compile the object and the client. (This time the order does not matter because
we no longer link to the DLL.) If you receive errors when creating the object, you
can look up the error codes (hRes) in WINERROR.H. The most probable failure is
an incorrect registration of the DLL or a non-exported entry point for
DllGetClassObject. (Of course, this has never happened to me!)

You can still mix and match Unicode/ASCII and Debug/Release clients and
servers.

DBCOM: COM Object in a DLL, Loaded Through COM

The previous sample illustrated a portion of COM's infrastructure for creating


instances of objects. In order to make our object a real COM object, only minor
details are missing:

 Allow reference counting on our exposed interfaces.

 Allow multiple interfaces on an object.

 Use the standard IClassFactory interface on the class factory object.

 Use the _stdcall calling convention.

 Allow for dynamic unloading of a DLL.


 Allow for self-registration of an object.

None of these features requires a lot of implementation and their implementation


is highly reusable. They take up some space to explain them, less space to
implement them, and once you have implemented them for one object (as I did
for this sample) you can reuse most of the implementation for all of your objects.
The class factory, the loading and unloading code, and registration support simply
require you to adjust the name of the class and the GUIDs involved.

Reference Counting

Suppose that our little database object will be used by several clients
simultaneously. Currently we could return the same instance of CDB to all calls to
IDBSrvFactory::CreateDB and all of the documents of our client would access
the same object. But problems will arise if one of the clients calls Release on our
object—the object would destroy itself, and all other clients would perform calls
on a nonexistent object.

The solution to this problem is so simple that it is required for all COM objects:
The object maintains a count of how many pointers to itself it gave away, and
when Release is called, it decrements this reference count. If the count reaches
zero, the object has no reason to exist anymore (at least, concerning external
references), and it is free to destroy itself.

How does the object count the references when it gives away a pointer? One way
would be to have the class factory object and the object work together: The class
factory object increments the object's reference count whenever it gives away an
external reference on a call to CreateInstance. But this approach would have a
fairly limited application. Therefore, the object actually exposes another function
to increment the reference count: Whoever has a pointer to the object can tell
the object that she or he just gave the pointer to somebody else. This entity can
be the class factory object or any other client. The function referred to above has
a very simple name: AddRef.

Therefore, the two required member functions for managing reference counts on
any COM object are:

ULONG AddRef();
ULONG Release();
The functions are not really required. However, the approach makes so much
sense for any object, even the tiniest, and the implementation of those functions
is so inexpensive, that you should just implement them.

If you want your object to be accessible remotely (from another process and/or
another machine), your object must provide these functions. The cost is very low
for the benefit you receive, both in terms of a programming philosophy within
your own code and in terms of migration to a distributed environment.
These two functions do not return HRESULT, because they cannot really fail,
unless the object does not exist anymore. And if so, who is going to give back an
error code? Inasmuch as these functions do not need a return value, they simply
return the current reference count of the object, but only if the object feels that it
is necessary to do so. In other words, the object is not even required to return a
reasonable value. This leaves the return value useful only for debugging
purposes; you should not use it for anything in your program.

Multiple Interfaces

In the next section we will take a look in greater depth at providing multiple
interfaces. For now, let's suppose that an object wants to return different
interfaces for different clients. The clients would need some way to ask the object
for a specific interface, and in fact we already introduced a method for doing so—
the IID passed to DllGetClassObject. Remember that the client passed an
interface ID to the class factory object, and in the object we validated whether it
was our class factory interface.

This is a good approach for a class factory: One client can obtain the class factory
object, asking for one interface, perhaps, so that the client can instantiate an
object without a user ID or password, and another client can create the object
through an interface that passes in UserID/Password. (This is not a very useful
example; it is just to explain why there could be multiple interfaces on an object,
and why this mechanism is also provided for the class factory object.)

If an object wants to expose two different interfaces, we can have the CreateDB
function of the class factory object—the function that actually instantiates an
object—receive another parameter, an IID, and we can create the appropriate
objects based on the requested interface.

But what if a client needs two interfaces on a given object—perhaps one interface
for creating tables and another for reading or writing? It would be great if the
client could ask the object for another interface, once it has a pointer to the first
interface.

To provide this functionality, the object can expose an additional member


function on its initial interface, a member function to which the client could pass
the interface ID it wants and a pointer to where it wants to receive the new
interface:

HRESULT QueryInterface(RIID riid, void** ppObj);


On receiving a call to this function, the object could check the interface ID and
return a pointer to a vtable that implements the requested functionality. The
object can provide this pointer however it pleases, as long as the contract
expressed by the interface ID is fulfilled. In other words, the order of the
functions and their parameters must be correct.
The object can create another C++ object, one that exposes a different interface
but works as a client to the real object and returns this pointer. It is important
that the returned interface work on the same logical object: If you delete a table
through the second interface, the first interface must show the changes. (This is
actually one way of implementing multiple interfaces on a COM object—
composing a COM object of multiple interrelated C++ objects. This is the most
tedious, but also the most flexible, way of doing it. Later we will see another
approach using multiple inheritance from multiple abstract base classes, and
another technique using C++ nested classes. You could even set up a table of
pointers with the addresses of the functions of the interface, and return the
address of a pointer to this table. You must do this if you want to use COM from
plain C.)

The idea behind exposing multiple interfaces through QueryInterface is to allow


for different ways of seeing the same object, not to access separate objects.
Separate objects should be assigned individual CLSIDs. They can share the same
interface: A database object in memory could allow the same methods as a
database object working on a file. The two objects could be accessed through the
same interface (IDB) but each have a unique CLSID for reaching the code that
implements them. The file-based database would probably provide an additional
interface for setting the filename and committing changes to the file.

If an object is actually a sub-object of another object—for example, an object


that represents a given table of the database—it can simply be returned by a
member function of its parent object. That is, a Read function could return a
pointer to a table object instead of returning a row of data.

If this sounds a little confusing, wait until the next section, where we will see the
real-world benefit of this feature. For now, I hope I have convinced you to
implement this way of asking for other interfaces to an object. It is especially
important if you think about a world of component objects communicating with
each other and being able to inquire into each others' level of functionality, the
version of that functionality, and so on.

Reference Counting and Support for Multiple Interfaces: IUnknown

We have seen two great features to have on any object: Reference Counting
(AddRef and Release) and multiple interfaces (QueryInterface). COM requires
any object to implement these three functions, in the sense I mentioned above. It
makes a lot of sense to have them on any object, and without them COM cannot
handle remote processing for your object.

In order to formalize this set of requirements, COM defines a standard interface


called IUnknown (declared by including OLE2.H):

class IUnknown {
public:
virtual HRESULT QueryInterface(RIID riid, void** ppObj) =0;
virtual ULONG AddRef() =0;
virtual ULONG Release() =0;
};
An object can simply derive from IUnknown and implement the three functions.
Another way to implement IUnknown is to add the three functions to the
beginning of your custom interface. Let's use IDB as an example:

class IDB {
// Interfaces
public:

// Interfaces for COM and useful anyway

virtual HRESULT QueryInterface(RIID riid, void** ppObj) =0;

virtual ULONG AddRef() =0;

virtual ULONG Release() =0;

// Interface for data access


virtual HRESULT Read(short nTable, short nRow, LPWSTR lpszData) =0;
(...)
};
Another way of achieving the same goal is to simply derive IDB from IUnknown
—that is, derive one abstract base class from another abstract base class, as
follows:

class IDB : public IUnknown {


// Interfaces
public:
// Interface for data access
virtual HRESULT Read(short nTable, short nRow, LPWSTR lpszData) =0;
(...)
};
This approach simply combines the two vtables into one: IDB includes all the
functions of IUnknown; IDB has the same three functions at the beginning of its
vtable. These functions are polymorphic, even on the binary level.

With this in mind, we will re-implement our DBAlmostCOM to make it DBCOM.


For QueryInterface, we already have two possible answers: The last sample we
created can satisfy the requests for IDB. But now we can also satisfy the
requests for IUnknown by simply returning IDB. Somebody who does not know
IDB, but just IUnknown, will simply call the first three functions in our vtable,
which happen to implement the same functions that IUnknown does.

Again, the real benefit of this approach will not be seen until the next sample, in
which we implement really useful multiple interfaces.

Standard Class Factory Interface: IClassFactory

Up to now, our CDBSrvFactory class factory object has been very specialized.
Its only member function is

HRESULT CreateDB(IDB* ppObj)


which it exposes through a very specialized interface, IDBSrvFactory. Each
object would need a specialized interface because the initial interface returned
would probably be different for each object. It sounds like a good idea, if the
caller of the CreateXXX function could be allowed to specify the initial interface
on the actual object, just as COM does for the class factory:

HRESULT MyCreateInstance(RIID riid, void** ppObj);


The class factory object could check the IID and instantiate the appropriate
object, implementing the required interface, or it could create an object and call
QueryInterface on that object to obtain the requested IID.

It is important to distinguish between the class ID (CLSID) and the interface ID


(IID): The class ID refers to a logical object that provides a given functionality
through a given interface—possibly multiple interfaces. The interface ID refers to
a specific layout of a vtable that is used to "talk" to the object.

This almost completes our standard class factory interface. Like any COM object,
the class factory object also exports the IUnknown functions for reference
counting and for supporting multiple interfaces:

class IClassFactory : public IUnknown {


virtual HRESULT CreateInstance(IUnknown *pUnkOuter,
REFIID riid, void** ppvObject) = 0;
virtual HRESULT LockServer(BOOL fLock) = 0;
};
Well, we almost have it: CreateInstance has an additional parameter,
pUnkOuter, that allows for a sophisticated reuse mechanism called aggregation.
We will not deal with aggregation in these samples, so we just pass in NULL and
when we receive a call, we will check for NULL, and fail if pUnkOuter is not NULL.

Note: Aggregation allows you to effectively merge two objects and make them
appear as one to a client. An “outer object” receives all the calls to IUnknown
(reference count and queries for new interfaces) and can—usually—selectively
return pointers to an “inner object” to the client. If the client calls a function from
IUnknown on the inner object, the inner object must call back to the outer
unknown (hence the parameter pUnkOuter at creation time of the inner object),
in order for the two objects to be perceived as identical. See the Platform SDK or
the OLE Design Specification for details.
Another function we have passed over is LockServer. This function lets the client
keep alive the module that implements the class factory object. Why do so, if
reference counting through IUnknown's AddRef/Release methods lets the
object know if someone still needs it? Some objects (such as local servers
implemented in an EXE) do not count references to the class factory object as
sufficient to keep the code for the object in memory; they exit when nothing but
a class factory remains. Clients can keep these kind of objects alive by calling
LockServer on the class factories that they want to keep after creating them.

For now, we will implement LockServer by effecting a global reference count for
the DLL that combines all outstanding references, whether it is to the database
object(s) or class factory object(s).

Dynamic Unloading of an Object's Code

In this sample, our client requires that the object be loaded while the client is
running, unless the user closes all the documents. In that case, the DLL is no
longer used by the client. If we had to optimize memory resources, we could
unload the DLL in this situation.

With implicit linking, there is no way to do this, and using the COM Libraries for
loading our DLL leaves us without access to the LoadLibrary/FreeLibrary calls.
COM takes the responsibility for unloading the code, which it does by querying
the DLLs it has loaded to see if they are still in use. COM calls another DLL
exported function, DllCanUnloadNow, and expects the DLL to answer S_OK or
S_FALSE.

We will implement this function by maintaining a global reference count:


Whenever a pointer is given away, whether to a class factory or to a database
object, we call AddRef to increment both the object's reference count and a
global variable containing all the reference counts. DllCanUnloadNow simply
checks to see if the global reference count is 0 and returns S_OK. COM then
knows that it can safely unload this DLL.

The COM libraries check the modules on behalf of the client: The client should call
CoFreeUnusedLibraries periodically if it wants to ensure that COM unloads
unused libraries.

Self-Registration

It is very practical for an object to provide for self-registration. This facilitates


installation (and de-installation!) of an object. The idea is very simple: Two
standard entry points, DllRegisterServer and DllUnregisterServer, provide
this functionality and can be called by any program that needs to register a DLL.
The regsrv32.exe utility provided with the OLE Controls CDK does nothing more
than call one of these functions on a DLL.

The implementation of these functions can be a little tedious due to the nature of
the registry (or is it the Win32 API used for accessing it?), but is nonetheless
straightforward.

Changes: Step by Step

1. Derive IDB from IUnknown. Remove the old declaration of Release from

IDB. Add _stdcall to all members of IDB, since this is the standard calling

convention for COM objects under Win32.

2. Remove the declaration of IDBSrvFactory. We will now use the standard

class factory interface, IClassFactory. Also remove the

IID_IDBSrvFactory.

3. Create a new IID_IDB with GUIDGen (or use one of the GUIDs that you

generated in advance). Do not reuse IIDB_IDBSrvFactory: GUIDs should

never be reused, because they define a unique contract and they are not a

limited resource. Add the declaration and implementation to interface\

dbsrv.h, object\dbsrvfact.cpp, and client\dbdoc.cpp respectively.

The new Interface\DBSRV.H looks like this:

(...)
// {30DF3432-0266-11cf-BAA6-00AA003E0EED}
extern const GUID IID_IDB;
//{ 0x30df3432, 0x266, 0x11cf, { 0xba, 0xa6, 0x0, 0xaa, 0x0, 0x3e, 0xe,
0xed } };
class IDB : public IUnknown {
// Interfaces
public:
// Interface for data access
virtual HRESULT _stdcall Read(short nTable, short nRow, LPWSTR lpszData)
=0;
virtual HRESULT _stdcall Write(short nTable, short nRow, LPCWSTR
lpszData)=0;
// Interface for database management
virtual HRESULT _stdcall Create(short &nTable, LPCWSTR lpszName) =0;
virtual HRESULT _stdcall Delete(short nTable) =0;
// Interfase para obtenber informacion sobre la base de datos
virtual HRESULT _stdcall GetNumTables(short &nNumTables) =0;
virtual HRESULT _stdcall GetTableName(short nTable, LPWSTR lpszName) =0;
virtual HRESULT _stdcall GetNumRows(short nTable, short &nRows) =0;

//virtual ULONG Release() =0;

};

4. Derive CDBSrvFactory from IClassFactory instead of from

IDBSrvFactory.

5. Change CDBSrvFactory::CreateDB to

CDBSrvFactory::CreateInstance, and add a member function called

CDBSrvFactory::LockServer.

6. Add a ULONG m_dwRefCount member to both CDB and

CDBSrvFactory for their respective reference counts. Also add a

constructor to both classes and initialize m_dwRefCount to 0.

7. Add a global variable, ULONG g_dwRefCount, to dbsrvimp.h and

dbsrvimp.cpp.

8. Add QueryInterface, AddRef, and Release member functions to both

CDB and CDBSrvFactory. (The order of the declaration in the

implementation header file does not affect the order in the vtable. The

vtable is defined by the order of declarations in IDB!)

The new Object\DBSrvImp.h looks like this:

class CDB : public IDB {


// Interfaces
public:
// Interface for data access
HRESULT _stdcall Read(short nTable, short nRow, LPWSTR lpszData);
(...)
HRESULT _stdcall GetNumRows(short nTable, short &nRows);

HRESULT _stdcall QueryInterface(REFIID riid, void** ppObject);


ULONG _stdcall AddRef();
ULONG _stdcall Release();
// Implementation
private:
CPtrArray m_arrTables; // Array of pointers to CStringArray
CStringArray m_arrNames; // Array of table names
ULONG m_dwRefCount;

public:
CDB();
~CDB();
};

extern ULONG g_dwRefCount;

class CDBSrvFactory : public IClassFactory {


// Interface
public:

HRESULT _stdcall QueryInterface(REFIID riid, void** ppObject);


ULONG _stdcall AddRef();
ULONG _stdcall Release();
HRESULT _stdcall CreateInstance(IUnknown *pUnkOuter, REFIID riid,
void** ppObject);

HRESULT _stdcall LockServer(BOOL fLock);

// Implementation

private:
ULONG m_dwRefCount;

public:
CDBSrvFactory();

};

9. Implement AddRef, Release, and QueryInterface for CDB and

CDBSrvFactory.

10. Change CDBSrvFactory::CreateDB to CreateInstance, and validate the

new parameters.

11. Implement CDBSrvFactory::LockServer.

Here is the new implementation in DBSRV.CPP:

CDB::CDB() {

m_dwRefCount=0;
}
HRESULT CDB::QueryInterface(REFIID riid, void** ppObject) {
if (riid==IID_IUnknown || riid==IID_IDB) {
*ppObject=(IDB*) this;
}
else {
return E_NOINTERFACE;
}

AddRef();
return NO_ERROR;
}

ULONG CDB::AddRef() {
g_dwRefCount++;
m_dwRefCount++;
return m_dwRefCount;
}

ULONG CDB::Release() {
g_dwRefCount--;
m_dwRefCount--;

if (m_dwRefCount==0) {
delete this;
return 0;
}

return m_dwRefCount;
}
And the new implementation in DBSRVFact.cpp:

ULONG g_dwRefCount=0;

// {30DF3430-0266-11cf-BAA6-00AA003E0EED}
static const GUID CLSID_DBSAMPLE =
{ 0x30df3430, 0x266, 0x11cf, { 0xba, 0xa6, 0x0, 0xaa, 0x0, 0x3e, 0xe,
0xed } };
// Create a new database object and return a pointer to it.
HRESULT CDBSrvFactory::CreateInstance(IUnknown *pUnkOuter, REFIID riid,
void** ppObject)
{
if (pUnkOuter!=NULL) {
return CLASS_E_NOAGGREGATION;
}

CDB* pDB=new CDB;

if (FAILED(pDB->QueryInterface(riid, ppObject))) {
delete pDB;
*ppObject=NULL;
return E_NOINTERFACE;
}

return NO_ERROR;
}

HRESULT CDBSrvFactory::LockServer(BOOL fLock) {


if (fLock) {
g_dwRefCount++;
}
else {
g_dwRefCount--;
}

return NO_ERROR;
}

CDBSrvFactory::CDBSrvFactory() {
m_dwRefCount=0;
}

HRESULT CDBSrvFactory::QueryInterface(REFIID riid, void** ppObject) {


if (riid==IID_IUnknown || riid==IID_IClassFactory) {
*ppObject=(IDB*) this;
}

else {
return E_NOINTERFACE;
}

AddRef();
return NO_ERROR;
}

ULONG CDBSrvFactory::AddRef() {
g_dwRefCount++;
m_dwRefCount++;
return m_dwRefCount;
}
ULONG CDBSrvFactory::Release() {
g_dwRefCount--;
m_dwRefCount--;

if (m_dwRefCount==0) {
delete this;
return 0;
}

return m_dwRefCount;
}

STDAPI DllGetClassObject(REFCLSID rclsid, REFIID riid, void** ppObject) {


if (rclsid!=CLSID_DBSAMPLE) {
return CLASS_E_CLASSNOTAVAILABLE;
}

CDBSrvFactory *pFactory= new CDBSrvFactory;

if (FAILED(pFactory->QueryInterface(riid, ppObject))) {
delete pFactory;
*ppObject=NULL;
return E_INVALIDARG;
}

return NO_ERROR;
}
Note that my implementations of CreateInstance and

DllGetClassObject do not verify the IID themselves but let the objects do

the work by using QueryInterface on the newly created objects. This

makes the implementation very reusable: You just change the name of the

class and everything works fine. If you add more interfaces to the object,

you only have to change QueryInterface in the object.

QueryInterface does an implicit AddRef, since it returns another pointer

to the same object. Since we always use QueryInterface after creating

an object, we initialize m_dwRefCount to 0.


This is just one way of implementing IUnknown, but a very modular one.

12. Add DllCanUnloadNow, DllRegisterServer, and DllUnregisterServer

to bdsrvfact.cpp and export them in DB.DEF.

HRESULT _stdcall DllCanUnloadNow() {


if (g_dwRefCount) {
return S_FALSE;
}
else {
return S_OK;
}
}
STDAPI DllRegisterServer(void) {
HKEY hKeyCLSID, hKeyInproc32;
DWORD dwDisposition;
if (RegCreateKeyEx(HKEY_CLASSES_ROOT,
"CLSID\\{30DF3430-0266-11cf-BAA6-00AA003E0EED}"),
NULL, "", REG_OPTION_NON_VOLATILE, KEY_ALL_ACCESS, NULL,
&hKeyCLSID, &dwDisposition)!=ERROR_SUCCESS) {
return E_UNEXPECTED;
}
(. . .) // See dbsrvfact.cpp for details.
return NOERROR;
}
STDAPI DllUnregisterServer(void) {
if (RegDeleteKey(HKEY_CLASSES_ROOT,
"CLSID\\{30DF3430-0266-11cf-BAA6-00AA003E0EED}
\\InprocServer32"))!=ERROR_SUCCESS) {
return E_UNEXPECTED;
}
if (RegDeleteKey(HKEY_CLASSES_ROOT,
"CLSID\\{30DF3430-0266-11cf-BAA6-00AA003E0EED}"))!=ERROR_SUCCESS) {
return E_UNEXPECTED;
}
return NOERROR;
}

13. Add uuid.lib in "Linker - Object/Library modules" to import the declaration

of IID_IUnknown and IID_IClassFactory.

14. Client: Change DBDOC.CPP to create (through a class factory object)

QueryInterface for IDB. (There is a Helper API in COM—


CoCreateInstance—that combines all the calls in one. For showing the

technical details, I chose to implement it here step by step.)

15. Client: Add the definition of IID_IDB to DBDOC.CPP.

16. Client: Add CDBApp::OnIdle (using ClassWizard). During idle processing,

call CoFreeUnusedLibraries to make sure that any DLLs loaded by COM

that do not have any reference to them get unloaded.

BOOL CDBApp::OnIdle(LONG lCount)


{
if (CWinApp::OnIdle(lCount)) {
return TRUE;
}
CoFreeUnusedLibraries();
return FALSE;
}

17. Add uuid.lib to the Linker - Object/Library Modules section, for

IID_IClassFactory.

(...)

IClassFactory *pDBFactory=NULL;

HRESULT hRes;
hRes=CoGetClassObject(CLSID_DBSAMPLE, CLSCTX_SERVER, NULL,
IID_IClassFactory,(void**) &pDBFactory);
if (FAILED(hRes)) {
CString csError;
csError.Format(_T("Error %x obtaining class factory for DB
Object
!"),
hRes);
AfxMessageBox(csError);
return FALSE;
}
hRes=pDBFactory->CreateInstance(NULL, IID_IDB, (void**) &m_pDB);
if (FAILED(hRes)) {
CString csError;
csError.Format(_T("Error %x creating DB Object!"), hRes);
AfxMessageBox(csError);
return FALSE;
}
pDBFactory->Release(); // Do not need the factory anymore.
(...)
Compile both the client and the object, register DB.DLL with regsvr32.exe, and
run the client. Again, you can mix and match Unicode/ASCII and Release/Debug
versions!

More than code, there was object philosophy involved, and your object design can
benefit from that philosophy even if you do not use COM. Reference counting on
objects and managing multiple interfaces on a single object are both useful
concepts. The other part of the code sample illustrated standard "infrastructure,"
such as standard entry points and self-registration, that you can use in your own
design, even if you are not planning to use COM.

DBCOMMUL: COM Object with Multiple Interfaces

Theory

The previous section introduced the QueryInteface function, which must be


present on any COM interface. This function allows a client to ask the object for
different pointers to itself that also point to different abstract base classes (also
known as interfaces). The only interfaces we have implemented so far are
IUnknown and IDB.

We will now look at a really useful way of implementing multiple interfaces. In the
previous sample, IDB was just a superset of IUnknown. The IDB interface
basically consists of three semantically related sections:

 Functions for accessing a table.

 Functions that allow creation and deletion of tables (that is, they manage

the database).

 Functions that return information on the database and a table.

Suppose different clients have different ways of using our object. For example,
some clients may just want to read and write to existing tables. Others may want
to create and delete tables but not to read their contents. Having additional
functions on the interface to the object is not a major overhead cost, and has the
advantage of exposing all the complexity of an object to all its users: A
programmer who accesses the content of a table gets to see all the functions for
managing the database. In our case, the functions are not tremendously
complex, but real objects could expose hundreds of different functions.

To show the general technique behind grouping of member functions, we will


break down our IDB interface into three new interfaces, each derived from
IUnknown:
class IDBAccess : public IUnknown {
public:
// Interface for data access
virtual HRESULT _stdcall Read(short nTable, short nRow, LPWSTR lpszData)
=0;
virtual HRESULT _stdcall Write(short nTable, short nRow,
LPCWSTR lpszData) =0;
};
class IDBManage : public IUnknown {
// Interface for database management
public:
virtual HRESULT _stdcall Create(short &nTable, LPCWSTR lpszName) =0;
virtual HRESULT _stdcall Delete(short nTable) =0;
};
class IDBInfo : public IUnknown {
// Interface for retrieval of information about the database.
public:
virtual HRESULT _stdcall GetNumTables(short &nNumTables) =0;
virtual HRESULT _stdcall GetTableName(short nTable, LPWSTR lpszName) =0;
virtual HRESULT _stdcall GetNumRows(short nTable, short &nRows) =0;
};
These three abstract base classes define three different vtable layouts:
IDBAccess has five entries in its vtable—the three IUnknown functions plus two
actual functions. IDBManage also has five member functions and IDBInfo has
six entries in its vtable: three for IUnknown and three functions of its own.

A client should be able to ask our object for any of these three interfaces. We
have to return a pointer to a vtable that contains only the appropriate function
addresses. How can we accomplish this?

The easiest way to achieve this is through multiple inheritance. We simply derive
our CDB implementation class from all the base classes that we want to provide
interfaces for: IDB, IDBAccess, IDBManage, and IDBInfo. We keep IDB for
providing backwards compatibility for existing clients.

What do multiple abstract base classes mean from the perspective of C++? If I
have a pointer to an object cast to one of its base classes, I can call the correct
member functions through the abstract base class. C++ implements this through
one vtable layout per class that the object points to in its instance data. If I have
a pointer to the same object, but cast to another abstract base class, I am able to
do the same—call the members through their vtable position. The cast pointer to
the instance must also contain a pointer to the vtable in its instance data.

How does C++ handle multiple pointers to different base classes? It provides
multiple vtables, one for each base class, and multiple pointers within the object's
instance data. When casting a pointer, the compiler simply adds the offset to the
correct part of the object's instance data. In our case, the object derives from
four abstract base classes, and thus has four vtable pointers at the beginning of
its instance data.

For example, when casting occurs from CDB to IDB, the compiler adds 0; when
casting occurs from CDB to IDBAccess, the compiler adds 4 to the pointer;
when casting occurs from CDB to IDBManage, the compiler adds 8; and so
forth.

There is one dedicated vtable for each interface on CDB—four total for all
instances of CDB. The compiler initializes these vtables with the correct function
addresses. If two abstract base classes include the same function (such as Read
in IDB and IDBAccess, or AddRef in all four interfaces), the compiler simply
repeats the function address. One implementation of AddRef is called from an
invocation of any base interface's AddRef.

This makes implementing multiple interfaces ridiculously simple—we just derive


CDB from all four interfaces and expand QueryInterface to return the correctly
cast pointer. C++ takes care of the rest.

This section definitely contains more new ideas than new code, and really
understanding what it does will help you a lot in understanding the flexibility and
power of COM.

Note: One of the major drawbacks of this approach to implementing multiple


interfaces is that you can’t provide reference counting on an interface basis. More
complex objects could need to load additional code or initialize additional data
structures, when being asked for more complex functionality. With multiple
inheritance, any call to Release goes to the same function. The object cannot
free the additional code/data, because it does not know which interface pointer
was released. The big advantage of using multiple inheritance for this lies in its
simplicity. All the dangers of normal C++ multiple inheritance, such as multiple
common base classes, and so forth, do not really apply, because interfaces
contain no data members.

Practice

Inasmuch as we are not going to add a lot of code, but we will add three new
interfaces, I will present a more elegant way of managing the interface IDs (in
general, GUIDs). In the previous samples, the GUIDs were not defined in the
Interface\dbsrv.h header file, because multiple "includes" of the header in a
project would have resulted in multiple definitions of the GUIDs. There is a simple
macro, defined through OLE2.H, that helps us provide the GUID in a header file:
DEFINE_GUID. This macro expands in two different ways, depending on an
INITGUID symbol. If this symbol is defined, it expands to a definition of the
GUID. If it is not defined, it expands to a declaration of the GUID without
initializing it.

One source file must contain #define INITGUID before including DBSRV.H, in
order to provide the definition of the symbol. This requires precompiled headers
to be disabled for this source file, otherwise the compiler uses the precompiled
header that includes the wrong macro expansion.

Changes: Step by Step

Keep a copy of the unmodified client to test the backwards compatibility of the
new object!

1. Add definitions for IDBAccess, IDBManage, and IDBInfo to DBSRV.H.

2. Declare IIDs using the DEFINE_GUID macro: Generate new IIDs with

GUIDGEN/UUIDGEN or use one of your unused, pre-generated GUIDs.

3. Derive CDB multiply from IDB, IDBAccess, IDBManage, and IDBInfo

(in Object\dbsrvimp.h).

class CDB : public IDB, public IDBAccess, public IDBManage,public


IDBInfo{
(...)
};

4. Change CDB::QueryInterface to allow for the new interfaces.

HRESULT CDB::QueryInterface(REFIID riid, void** ppObject) {


if (riid==IID_IUnknown || riid==IID_IDB) {
*ppObject=(IDB*) this;
}

else if (riid==IID_IDBAccess) {
*ppObject=(IDBAccess*) this;
}

else if (riid==IID_IDBManage) {
*ppObject=(IDBManage*) this;
}

else if (riid==IID_IDBInfo) {
*ppObject=(IDBInfo*) this;
}

else {
return E_NOINTERFACE;
}

AddRef();
return NO_ERROR;
}

5. Remove the old definition of the CLSID and the IID in object\dbsrv.cpp

and object\dbsrvfact.cpp. Create a file called Interface\guids.cpp that

#defines INITGUID, #includes ole2.h, and then #includes dbsrv.h. Add

interface\guids.cpp to the project and deactivate precompiled headers for

it.

6. In the client, call CreateInstance with IID_IUnknown and change m_pDB

to IUnknown*.

7. In the client, before calling Create, Read, and Write, use

QueryInterface on m_pDB to obtain the appropriate interface. Release

the obtained interface pointer after using it.

8. Add Interface\Guids.cpp to the project and deactivate precompiled

headers for it. Remove the declaration of CLSID and IIDs from client\

dbdoc.cpp

Test the new object: Register it using regsrv32.exe and run the client. Also, try
running the client that used IDB (from the DBCOM sample). Since we still
support this interface, this client continues to work.

QueryInterface provides a great mechanism for maintaining backwards


compatibility between versions of an object. It also provides a method for very
flexible version checking. The client and the server do not check an abstract
"version number" where both have to agree somehow what a specific version
means in terms of functionality and interfaces. A client can check for each specific
feature by asking for an interface through QueryInterface. A client could also
ask first for a more sophisticated new interface, and if the object does not provide
it, the client can ask for an "older" interface and provide a workaround. Using this
mechanism you can provide interoperability between both a new client with an
old server and an old client with a new server.

This is why OLE 2.0 is not OLE 2.0 anymore. It is just OLE. New features do not
require a totally new version, they are just added on top of the old ones and each
user queries for the features it needs.
DBCOMREM: COM Object in a Separate Process

Shipping and Handling of Function Calls

If this were all there were to COM, it would still be interesting, but not really allow
for a "component software revolution." (See note.) You could build tremendously
complex software using objects implemented in DLLs and use them from a client.
But what if you wanted to share an object between different processes? Or if you
did not want to load another object in your same address space for security and
reliability reasons? Or what if you wanted to share an object between several
machines, like a SQL Server Database?

Note: Other technologies, such as OpenDoc or SOM, actually stop here! Their
underlying technology does not even provide a way for objects in different
processes to communicate with each other: Their programming model for in-
process and out-of-process objects is fundamentally different (SOM vs. DSOM),
and their standardization is based on the in-process model!
COM provides a very easy way to make objects in other processes (on the same
machine, and soon on remote machines) appear as standard COM objects as we
know them now. The underlying idea is to simply extend the idea of directing a
function call through a vtable—you provide a special object that implements your
interface in a special way. A function in this special object (called a proxy object)
receives all the parameters, writes them sequentially into a buffer in memory,
and sends the buffer to the other process. In the other process, another special
object (called a stub object) receives this buffer, unpacks the parameters again,
and calls the function that the caller meant to call in the first place. Then the
return value(s) are packed into another buffer, sent back to the calling process,
and the proxy unpacks the return value(s) and returns them to the caller.

This sounds complicated—and actually is even a bit more complicated—but to


both the client and the server, this whole process of packing, sending, and
unpacking is completely transparent, except for the difference in speed, due to
the context switch or the network message sent.

All that an object has to supply is the code that actually packs and unpacks the
parameters. COM takes care of shipping and handling—setting up the connections
between proxy and stub, sending the packed parameters and return values. The
marshalling code (as the proxy and stub are also called) is not provided on a per-
object basis, but on a per-interface basis: The interface designer can provide the
proxy and stub objects and everybody wanting to use this interface takes
advantage of the remoting capability.

We will provide a DLL that implements a COM object and takes care of this
packing and unpacking (also called marshalling and unmarshalling). We will
register this COM object like any other COM object under the
HKEY_CLASSES_ROOT\CLSID registry key.
In addition, we will tell COM that this specific object is able to marshall a specific
interface by registering its CLSID under another key called
HKEY_CLASSES_ROOT\Interfaces\{iid}. COM knows how to marshall IUnknown,
and when asked for a different interface, it looks under this key, finds the CLSID
of the object that handles marshalling, and starts using it. I will not go into detail
on this here, since it is explained in great detail in Kraig Brockschmidt's Inside
OLE.

The object that does marshalling and unmarshalling is also called a proxy or stub.
The part that pretends to be the object on the side of the client is called the
proxy, and the part pretending to be the client on the side of the object is called
the stub. The entry under HKEY_CLASSED_ROOT\Interfaces\{iid} is therefore
called ProxyStubClsid32. There is another entry, NumMethods, that simply
indicates the number of methods in this interface, including the three IUnknown
methods.

The generation of the proxy/stub object is actually trivial: The header file defining
the interface needs to be enhanced to indicate more about the parameters than
the C++ language provides, such as which parameters only go into the object,
which parameters only go out (are actually return values), and which go in and
out. Also, the length of buffers needs to be defined, and some other tedious
details about pointers need to be added. For most simple parameters, the
changes are actually straightforward, as we will see. This modified header file is
not in C++ anymore—the language is called IDL, or Interface Definition
Language.

This IDL file is then passed to a compiler for this language, which generates C
source code necessary to build the DLL.

Exporting an Object from an Executable

There is now just one minor technical detail missing. How can COM call into an
object that is implemented in an executable? EXEs do not provide callable entry
points, so we cannot use the same approach as we did with a DLL using
DllGetClassObject.

The solution is relatively simple and spins around the class factory object: When
initializing, the EXE simply calls an application programming interface (API)
function in the COM libraries (CoRegisterClassObject) and passes it a pointer to
IUnknown on a class factory object. COM saves this pointer in an internal table
and uses it when clients want to create an object of this class. The only problem
this creates in the object's code is that class factory objects are not supposed to
keep an object running once a "real" object has been created and released. The
object's code is not needed anymore, even if it still has its class factories
registered with the COM libraries. Thus for global reference counting, references
to the class factory object should not count. When the executable is finished, it
revokes the class factory objects (CoRevokeClassObject), passing in an
identifier that COM returned in CoRegisterClassObject.

A COM object in an executable (also called a local server) is registered very


similarly to an object in a DLL (in-proc server). The entry is called LocalServer32
and contains the path to the executable. Both entries can be present, allowing
clients to choose which implementations they prefer.

For the client, all this is completely transparent. It just calls CoGetClassObject
(or CoCreateInstance). If it asks for just a server (CLSCTX_SERVER), COM first
checks if there is an in-proc server registered, and if not, it checks for a local
server. It then loads the local server by running the executable and waits for the
executable to call CoRegisterClassObject. It then wraps the pointer with a
special proxy/stub for IUnknown (class factory objects are always registered as
IUnknown first) and returns the client a pointer to the proxy. From there on,
COM uses the mechanisms briefly described above: When the client calls
QueryInterface, COM loads another stub object on the server, lets it wrap the
object's interface, connects the stub to a proxy that it loads on the client's side,
and returns a pointer to the proxy to the client.

The client never sees any of this; it just gets pointers to vtables, which it uses as
before. The only time a client might be aware of an in-proc/remote difference is
while creating the object: It can indicate to COM the context that it wants the
object to run in—CLSCTX_LOCAL_SERVER, CLSCTX_INPROC_SERVER, or just
CLSCTX_SERVER. There are some other flags, which I will not discuss here.

Creating the Proxy/Stub Object

COM needs just a little bit of help to perform all these miracles: It needs an
object that knows how to pack and unpack the specific parameters of an
interface's functions.

The tool that generates these helper objects is the MIDL compiler (Microsoft IDL
compiler). We feed it with a simple IDL file, which is basically an extended C++-
header file, and it returns a whole bunch of files:

 Dlldata.c and xxx_p.c—Files compiled into a DLL.

 Xxx.h—A C++ header file with the interface declaration. All IDL-

extensions are either removed or commented out.

 Xxx_i.c—A file with the definitions of the interface IDs.

The header file and the file with the interface IDs will form part of the interface
definition: They will be used by clients of the object and by the object itself.
We will put the two other files into a new directory, \ProxyStub, where we will
also create a project for the DLL, a module definition file, and an additional file
with support for self-registration.

The DLL will export one COM object that provides proxy/stubs for all four
interfaces. The object provides methods that let COM query for the correct
proxy/stub implementation (see Inside OLE for details). Thus, under one CLSID,
COM finds the proxy/stubs for four interfaces. By default, the MIDL-generated
code uses the interface ID of the first interface declared in the IDL file as the
CLSID for the proxy/stub object. Look in RPCPROXY.H for instructions on
changing the default and some other options.

For the default proxy/stub, we will have to register the following keys:

CLSID\{30DF3432-0266-11cf-BAA6-00AA003E0EED}="DB Sample ProxyStub"


// This is the IID of IDB used as the CLSID of the proxy/stub for all
// four interfaces.
CLSID\{30DF3432-0266-11cf-BAA6-00AA003E0EED}\InprocServer32=<path>\db.dll
Interface\{30DF3432-0266-11cf-BAA6-00AA003E0EED}="IDB"
Interface\{30DF3432-0266-11cf-BAA6-00AA003E0EED}\ProxyStubClsid32="
{30DF3432-0266-11cf-BAA6-00AA003E0EED}"
Interface\{30DF3432-0266-11cf-BAA6-00AA003E0EED}\NumMethods = "10"
Interface\{30DF3433-0266-11cf-BAA6-00AA003E0EED}="IDBAccess"
Interface\{30DF3433-0266-11cf-BAA6-00AA003E0EED}\ProxyStubClsid32="
{30DF3432-0266-11cf-BAA6-00AA003E0EED}"
Interface\{30DF3433-0266-11cf-BAA6-00AA003E0EED}\NumMethods = "5"
Interface\{30DF3434-0266-11cf-BAA6-00AA003E0EED}="IDBManage"
Interface\{30DF3434-0266-11cf-BAA6-00AA003E0EED}\ProxyStubClsid32="
{30DF3432-0266-11cf-BAA6-00AA003E0EED}"
Interface\{30DF3434-0266-11cf-BAA6-00AA003E0EED}\NumMethods = "5"
Interface\{30DF3435-0266-11cf-BAA6-00AA003E0EED}="IDBInfo"
Interface\{30DF3435-0266-11cf-BAA6-00AA003E0EED}\ProxyStubClsid32="
{30DF3432-0266-11cf-BAA6-00AA003E0EED}"
Interface\{30DF3435-0266-11cf-BAA6-00AA003E0EED}\NumMethods = "6"
This looks like a lot of work, but the MIDL-generated code provides even
implementations of DllRegisterServer and DLLUnregisterServer, if you
compile (the C code) with the preprocessor symbol REGISTER_PROXY_DLL. For
those of you who prefer to see how it is done, I also implemented a manual
version of the registration functions.

IDL

Some comments regarding the IDL file: We will need to include unknwn.idl in
order to derive our interfaces from IUnknown. If you need to use windows types
such as DWORD and others, you can also include wtypes.idl.
Each interface is prefixed by a special header of the following structure:

[object, uuid(606C3DE0-FCF4-11ce-BAA3-00AA003E0EED),
ointer_default(unique)]
This header instructs MIDL to generate a proxy/stub ("object"), tells it the IID
("uuid") and assumes pointers as unique (see the RPC reference for more
information).

MIDL does not accept parameters by reference. Since on the binary level
references are just pointers, we can "cheat" the MIDL with some typedefs
(refshort.idl; I copied this idea from wtypes.idl): For C we provide a pointer to
short, for C++ we provide a reference to short.

In order to receive string parameters ([out]), we will use a fixed buffer size, both
to avoid having to free returned memory and to maintain "compatibility" with
previous clients. This is indicated by an attribute [size_is(80)] to a string
parameter. (See the RPC Reference for more information.)

Changes: Step by Step

The Proxy/Stub

1. Copy interface\bdsrv.h and save as interface\ibd.idl.

2. Add [in], [out], and [size_is] flags, remove DECLARE_GUIDs, provide

[object] headers for each interface.

3. Create refshort.idl to provide C declarations for the short parameters.

4. Import unknwn.idl and refshort.idl in idb.idl.

5. Compile idb.idl with the following: midl ..\interface\idb.idl

/header ..\interface\idb.h
/iid ..\interface\idb_i.c /ms_ext /c_ext
running it with a working directory \ProxyStub. This instructs the compiler

to write the interface files in our interface directory.

6. Compile refshort.idl with midl ..\interface\refshort.idl /out ..\interface

/ms_ext /c_ext. This will provide refshort.h, which is included in the

generated idb.h.

7. Create a project for a plain DLL (idbps.mak) and include dlldata.cpp,

idb_p.cpp, and interface\idb_i.c. Change idb_p.c to include ..\interface\

idb.h instead of idb.h. (The MIDL compiler does not change the paths in
the generated include files; you will have to apply this change anytime you

recompile the IDL file.)

8. Link with rpcrt4.lib to include the RPC run-time library used by the

generated proxy/stub.

9. Create a module definition file called idbps.def, add it to the project, and

export DllGetClassObject, DllCanUnloadNow, DllRegisterServer, and

DllUnregisterServer.

10. Self-registration. You can either define a REGISTER_PROXY_DLL symbol in

Preprocessor options or, if you do not want to use the self-registration

provided by MIDL for whatever reason, you must create a file called

idbpsref.cpp and implement DllRegisterServer/DllUnregisterServer

(see above for the exact keys we need to register). You could also register

your proxy/stub manually (or in a special installer, or with a simple .REG

file), but it is much more in keeping with object-oriented philosophy to

provide self-registration within the same DLL.

11. Compile the DLL and register it using regsvr32.exe.

We just expanded COM to handle our four custom interfaces for remote

access. Note that I named all the files IDB, to indicate that they deal with

the "database"-interfaces, and not the object implementing these

interfaces.

The Object

With the "expanded" COM, all we need to provide is an object

implemented in an executable.

12. Remove the IID definitions from dbsrv.h. They will be provided in

IDB.H/IDB_C.C. Just leave the DEFINE_GUID for the CLSID of the object.

Again, note that the proxy/stub is related to the interface, not to the

database object itself. Any object that wants to use this custom interface

will use the same proxy/stub. The proxy/stub is actually an extension of

COM and not part of the object.

13. Include idb.h.


14. Create a new MFC Project called ObjectEXE\DBLocal.mak. Defaults: MDI,

no support for anything OLE-related.

15. Add #define _AFX_NO_BSTR_SUPPORT and #include ole2.h to stdafx.h.

Define a preprocessor symbol, LOCALSERVER, which we will use for

conditional compilation of dbsrv.cpp. Add targets for Unicode. Add

OLE32.LIB and UUID.LIB libraries.

16. Add Object\DBSRV.CPP and Object\DBSrvFact.CPP to the project. We will

use a common code base for in-process and local servers. Also add

interface\idb_i.c and interface\guids.cpp, and disable precompiled headers

for both.

17. In CDBLocalApp::InitInstance call CoInitialize; create a CDBFactory object

and register it with CoRegisterClassObject. Also check for command-line

parameters (/REGSERVER and /UNREGSERVER) and calls

(DllRegisterServer or DllUnregisterServer) if appropriate. You could

also have the object register itself all the time (except when executed with

/UNREGSERVER). BOOL CDBLocalApp::InitInstance() {

if (m_lpCmdLine[0] != '\0')
{
if (lstrcmpi(m_lpCmdLine, "/REGSERVER")==0) {
if (FAILED(DllRegisterServer())) {
AfxMessageBox("Unable to register the server!");
}
return FALSE;
}
else if (lstrcmpi(m_lpCmdLine, "/UNREGSERVER")==0) {
if (FAILED(DllUnregisterServer())) {
AfxMessageBox("Unable to unregister the server!");
}
return FALSE;
}
}
DllRegisterServer();
CoInitialize(NULL);
CDBFactory *pFactory=new CDBFactory();
pFactory->AddRef();
if (FAILED(CoRegisterClassObject(CLSID_DBSAMPLE, (IUnknown*)
pFactory,
CLSCTX_INPROC_SERVER | CLSCTX_LOCAL_SERVER, REGCLS_MULTIPLEUSE,
&m_dwDBFactory))) {
pFactory->Release();
return FALSE;
}
pFactory->Release(); // COM keeps a reference to the class factory.
(...)
}

18. Add CDBLocalApp::ExitInstance and revoke the class factory object

(CoRevokeClassObject). Then call CoUninitialize().

int CDBLocalApp::ExitInstance() {
if (m_dwDBFactory) {
CoRevokeClassObject(m_dwDBFactory);
m_dwDBFactory=0;
}
return CWinApp::ExitInstance();
}

19. Declare CDBLocalApp::m_dwDBFactory and initialize it to 0 in the

constructor. This data member saves the identifier that COM returns when

registering the class factory object. Include ...\object\dbsrvimp.h in

DBLocal.cpp. Remove the call to OnFileNew, because we will associate a

document with each server object created.

20. Change DllRegisterServer/UnregisterServer to write the name of the

EXE in LocalServer32 instead of the name of the DLL in InprocServer32.

(Use #ifdef LOCALSERVER to keep a common code base for local and in-

process server!)

21. CDBFactory::AddRef and Release should not modify g_dwRefCount for

a LOCALSERVER.

22. For a LOCALSERVER: CDBRelease() should check g_dwRefCount and

close the EXE if it is no longer used. ULONG CDB::Release() {

g_dwRefCount--;
m_dwRefCount--;
if (m_dwRefCount==0) {
#ifdef LOCALSERVER
if (g_dwRefCount==0) {
AfxGetMainWnd()->PostMessage(WM_CLOSE);
}
#endif
delete this;
return 0;
}
return m_dwRefCount;
}
The following steps are optional. The procedure creates a document

whenever a database object is created, and closes the document when a

database object is released. Note that this implementation is not thread

safe—you would need to synchronize object creation.

23. Add a member CDBLocalApp::m_pLastDoc, set it to NULL in the

constructor, and set its value to the this pointer in the constructor of

CDBLocalDoc.

24. In CDB::CDB create a new document and keep a pointer to it (#ifdef

LOCALSERVER): Declare CDB as a friend of CDBLocalApp, call

CDBLocalApp::OnFileNew, obtain m_pLastDoc and save it in a data

member CDB::m_pDoc. Don't forget to include dblocal.h.

25. In CDB:~CDB close the document m_pDoc.

26. Declare CDB::m_pDoc (#ifdef!) and DllRegisterServer/DllUnregisterServer

in dbsrv.h.

27. Optional: It is a good idea to make the object thread-safe. This is not

required for remoting, because OLE's default uses only a single thread. It

is easy to make your object thread-safe and be prepared for OLE's

multithreading models (see the SDK for details).

If multiple functions are executed "simultaneously," we must guarantee

that all our code is prepared for this by (a) making it reentrant, and (b)

synchronizing access to global memory. Reentrancy is easy—most

variables are local variables on the stack and are allocated exclusively for

each thread executing a function. In our sample, all we have to worry

about is the reference counters—both the global and the object counter—

and the "database" structures. We will secure our AddRef and Release
by using the Win32 API functions InterlockedIncrement and

InterlockedDecrement. The structures use MFC classes that are thread-

safe already for threads accessing different objects. We must protect each

object instance against multiple use with a critical section per object.

Here is the thread-safe implementation of IUnknown in dbsrv.cpp

(analog in dbsrvfact.cpp):

(...)
ULONG CDB::AddRef() {
InterlockedIncrement((long*) &g_dwRefCount);
InterlockedIncrement((long*) &m_dwRefCount);
return m_dwRefCount;
}
ULONG CDB::Release() {
ULONG dwRefCount=m_dwRefCount-1;
ULONG dwGlobalCount=InterlockedDecrement((long*) &g_dwRefCount);
if (InterlockedDecrement((long*) &m_dwRefCount)==0) {
#ifdef LOCALSERVER
if (dwGlobalCount==0) {
AfxGetMainWnd()->PostMessage(WM_CLOSE);
}
#endif
delete this;
return 0;
}
return dwRefCount;
}
Making the object itself thread-safe: Declare a critical section in CDB:

class CDB : public IDB, public IDBAccess, public IDBManage, public


IDBInfo {
(...)
// Implementation
private:
(...)
ULONG m_dwRefCount;

CRITICAL_SECTION m_secDB;

(...)
};
The implementation in dbsrv.cpp: Use the critical section.
HRESULT CDB::Read(short nTable, short nRow, LPWSTR lpszData) {

EnterCriticalSection(&m_secDB);

(...)

LeaveCriticalSection(&m_secDB);

return NO_ERROR;
}
(...)
CDB::~CDB() {

EnterCriticalSection(&m_secDB);

short nNumTables;
for (GetNumTables(nNumTables);nNumTables>0; GetNumTables(nNumTables))
{
Delete(nNumTables-1);
}
#ifdef LOCALSERVER
m_pDoc->OnCloseDocument();
m_pDoc=NULL;
#endif

LeaveCriticalSection(&m_secDB);

DeleteCriticalSection(&m_secDB);

}
CDB::CDB() {

InitializeCriticalSection(&m_secDB);

m_dwRefCount=0;
#ifdef LOCALSERVER
((CDBLocalApp*) AfxGetApp())->OnFileNew();
m_pDoc=((CDBLocalApp*) AfxGetApp())->m_pLastDoc;
#endif
}
Compile the local server. If you want, also compile the in-process server,

to validate your "common code base" (you'll need to add idb_c.c for the

IIDs!).
The Client

The clients DBCOM and DBCOMMUL work without change! You just have to

unregister the in-process server, register the local server, and COM does

the rest.

We will make a minor change to allow the user to choose between in-

process and local server. In a real-world application this is a design

decision, but for the purpose of this section we provide a "user-interface":

28. Before creating an object, show a message box and let the user choose.

Use CLSCTX_LOCAL_SERVER to force a local server, if both server types

are registered. Use CLSCTX_SERVER to use an in-process server if

present, and default to a local server.

(...)
if (AfxMessageBox(_T("Do you want a local server?"),
MB_YESNO)==IDYES) {
hRes=CoGetClassObject(CLSID_DBSAMPLE, CLSCTX_LOCAL_SERVER, NULL,
IID_IClassFactory, (LPVOID*) &pDBFactory);
}
else {
hRes=CoGetClassObject(CLSID_DBSAMPLE, CLSCTX_SERVER, NULL,
IID_IClassFactory, (LPVOID*) &pDBFactory);
}
(...)

29. Add idb_c.c to the project for the IIDs (we removed them from dbsrv.h!).

COM provides a completely transparent model for in-process and out-of-process


objects. The same object code can be used as a high-performance/low-security
in-process object or as a network-speed/high-security out-of-process object. The
client does not see any difference in using an in-process or out-of-process object.
Our client from the DBCOM sample works transparently with a server
implemented in another process!

Conclusion

As you have seen, it does not really take a lot to implement and use objects
based on the programming model introduced by COM. The techniques are very
valuable on their own, even if you do not intend to use the COM Libraries for
whatever (strange!) reason.
Taking advantage of COM allows you to:

 Package your objects in a uniform way as executables or dynamic-link

libraries.

 Break your application into really reusable components, ones that you can

redistribute easily to others.

 Be prepared to distribute your components on multiple machines, through

a simple change in configuration. As COM is being ported to other

platforms, you can take even take advantage of real, standardized,

efficient interoperability between different platforms.

Some general tips for implementing your objects using COM:

 Before implementing a custom interface, look for interfaces that COM

and/or OLE provide already. A wide variety of functionality is already

covered by these interfaces. Your advantages are multiple:

 You can take advantage of the remoting code provided for these

interfaces.

 Chances are high that other standard clients—ones that you may not

have thought of—will be able to use at least part of the functionality of

your object, just because they know how to use a COM/OLE-provided

interface.

 COM provides—for free!—additional functionality related to some

interfaces through standard implementations.

 When you design your interfaces, try to make them as general as possible,

so that either you or others will be able to take advantage of them in the

future. Publish your interfaces as widely as possible; if you can make them

a standard, they will be no different from any Microsoft-provided interface,

because they merge into the extensible COM architecture.

Making an object "COM compliant" requires a very low overhead, in terms of both
implementation cost and overhead incurred.

To implement a COM object, simply reuse all the "adorning" code, such as class
factory objects, IUnknown implementation, Registration code (you could even
make it a macro or a C++ template), and start implementing your object's
functionality!

Overhead in memory is very small: You basically have a 4- byte overhead per
interface (the vtable pointers in the object's instance data) plus 4 bytes for the
reference counter. Even the smallest object can live with this overhead (8 bytes),
and you get all the advantages of making it a COM component.

Performance overhead in the in-process case can be beaten only by direct


function calls, but the overhead is irrelevant for any real-world use. In the local
case, you basically have to implement your own remoting infrastructure, either
on top of RPC (like COM) or through windows messages, shared memory, and
semaphores, or whatever you wish. You could probably be a little more efficient if
you optimized your implementation for your kind of interface and interface usage.
But usually COM's performance will be more than sufficient and will save you a lot
of work!

Just spend some time thinking about which of your objects or libraries you could
implement as COM objects. Which would be in-process servers, which could be
local servers? How would your overall design gain, if you could use local servers?

COM can make your life a lot easier today, and prepares you for a very exciting
future!

You might also like