ADVANCED DATA STRUCTURES

Thursday, July 24, 2008

Object-Orientation and C++

C++ is just one of many programming languages in use today. Why are there so many languages? Why do new ones appear every few years? Programming languages have evolved to help programmers ease the transition from design to implementation.

The first programming languages were very dependent on the underlying machine architecture. Writing programs at this level of detail is very cumbersome. Just as hardware engineers learned how to build computer systems out of other components, language designers also realized that programs could be written at a much higher level, thereby shielding the programmer from the details of the underlying machine.

Why are there such a large number of high-level programming languages? There are languages for accessing large inventory databases, formatting financial reports, controlling robots on the factory floor, processing lists, controlling satellites in real time, simulating a nuclear reactor, predicting changing atmospheric conditions, playing chess, and drawing circuit boards. Each of these problems require different sets of data structures and algorithms. Programming languages are tools to help us solve problems. However, there is not one programming language that is best for every type of problem. New programming languages are often developed to provide better tools for solving a particular class of problems. Other languages are intended to be useful for a variety of problem domains and are more general purpose.

Each programming language imparts a particular programming style or design philosophy on its programmers. With the multitude of programming languages available today, a number of such design philosophies have emerged. These design philosophies, called programming paradigms, help us to think about problems and formulate solutions. In this issue, we begin the first part of a two-part series on the object-oriented paradigm and how C++ supports the paradigm. Before embarking on our journey into the object-oriented paradigm, we first take a look at how programming paradigms have shaped software design.

The Class: Data Encapsulation, Data Hiding, and Objects

Like a C structure, a C++ class is a data type. An object is simply an instantiation of a class. C++ classes have additional capabilities as the following example should show:

Vector v1(1,2),
Vector v2(2,3),
Vector vr;
vr = v1 + v2;

Vector is a class. v1, v2, and vr are objects of class Vector. v1 and v2 are given initial values through their constructor. vr is also initialized through its constructor to certain default values. The example illustrates a major power of C++. Namely, we can define functions on a class as well as data members. Here, we have an overloaded addition operator which makes our expression involving Vectors seem much more natural than the equivalent C code:

Vector v1, v2, vr;
add_vector( &vr , &v1, &v2 );

The ability to define these member functions allows us to have a constructor for Vector, code that creates an object of class Vector. The constructor ensures proper initialization of our Vectors.

Though not illustrated in the above example, a class can limit the use of its data members and member functions by non-member code. This is encapsulation. If class K defines member M as private, then only members of class K can use M. Defining M as public means any other class or function can use M.

Let's take a look at a trivial implementation of Vector that will show is a little about constructors, operators, and references.

#include 
class Vector
{
public:
       Vector(double new_x=0.0,double new_y=0.0)       {
               if ((new_x<100.0) && (new_y<100.0))
               {      
                       x=new_x;
                       y=new_y;
               }
               else
               {
                       x=0;
                       y=0;
               }
       }
       Vector operator +
               ( const Vector & v)
       {
               return
               (Vector (x + v.x, y + v.y));
       }
       void PrintOn (ostream& os)
       {
               os << "["
                       << x
                       << ", "
                       << y
                       << "]";
       }

private:
       double x, y;
};

int main()
{
       Vector v1, v2, v3(0.0,0.0);
       v1=Vector(1.1,2.2);
       v2=Vector(1.1,2.2);
       v3=v1+v2;
       cout << "v1 is ";
       v1.PrintOn (cout);
       cout << endl;
       cout << "v2 is ";
       v2.PrintOn (cout);
       cout << endl;
       cout << "v3 is ";
       v3.PrintOn (cout);
       cout << endl;
}

Encapsulation of x and y means that they cannot be altered without the help of specific member functions. Any member function or data member of Vector can use x and y freely. For everyone else, the member functions provide a strict interface. They ensure a particular behavior in our objects. In the example above, no Vector can be created that has an x or y component that exceeds 100. If at some point code tries to do this, then the constructor performs bounds-checking and sets x and y both to zero. In a normal C structure we can simply do the following:

Vector v1;
InitVector( & v1, 99 , 99 );
v1.x = 1000;

InitVector() closely approximates a C++ constructor. Assume it tries to behave like the constructor Vector() in example three. This C code demonstrates how without encapsulation we can easily violate the rules set up in our pseudo-constructor. With class Vector, both x and y are private. As a result, they can only be accessed by member functions. If our goal is to prevent x and y from exceeding 100, we simply have all accessor functions perform bounds-checking. In fact, once created and outside of the addition operation there's no way to modify x or y. They are private and no member function, outside of the constructor, sets their values. Notice how the constructor Vector() limits our Vector component values. By returning a new object, the addition operator uses the constructor to check for overflow.

We could have made `+' do multiplication instead. Though such manipulation is atypical, it can be quite useful. For example, C++ comes standard with a streams library which uses the << operator to provide output.

There is one useful thing about the addition operator: we don't have to pass the addresses of arguments. The arguments for the addition operator specify that the parameters are references (using the reference operator &-not the same as the address-of operator &). Recall that the reference operator allows us to use the same calling syntax as call-by-value and yet modify the value of an argument. The reference operator avoids the overhead of actually creating a new object. Thus, we can avoid a lot of indirection.

However, the most powerful OOP extension C++ provides is probably inheritance. Classes can inherit data and functions from other classes. For this purpose we can declare members in the base class as protected: not usable publicly but usable by derived classes.

In conclusion, we looked at some of the features that make C++ a better C. C++ provides stronger type checking by checking arguments to functions and reduces syntactic errors through the use of reference types. Programmers can also add new types to the language by defining classes. Although we have only taken a brief look at classes, we will see more abstract discussion of C++ object-orientation as well as general OOP concepts in upcoming columns.

A FREE STORE IS PROVIDED.

In C, variables are placed in the free store by using the sizeof() macro to determine the needed allocation size and then calling malloc() with that size. Variables are removed from the free store by calling free(). With classes, using malloc() and free() becomes tedious. C++ provides the operators new and delete, which can allocate not only built-in types but also user-defined types. This provides a uniform mechanism for allocating and deallocating memory from the free store.

For example, to allocate an integer:

int *pi;
pi = new int;
*pi = 1;

and to allocate an array of 10 ints:

int *array = new int [10];
for (int i=0;i < 10; i++)
       array[i] = i;

Just as with malloc() the memory returned by new is not initialized; only static memory has a default initial value of zero.

Suppose we have defined a type for complex numbers, called complex. We can dynamically allocate a complex number as follows:

complex* pc = new complex (1, 2);

In this case, the complex pointer pc will point to the complex number 1 + 2i.

All memory allocated using new should be deallocated using delete. However, delete takes on different forms depending on whether the variable being deleted is an array or a simple variable. For the complex number above, we simply call delete:

delete pc;

Delete calls the destructor for the object to be deleted. However, to delete each element of an array, you must explicitly inform delete that an array is to be deleted:

delete [] array;

The C++ compiler maintains information about the size and number of objects in an array and retrieves this information when deleting an array. The empty bracket pair informs the compiler to call the class destructor for each element in the array.

Be careful, attempting to delete a pointer that has not been initialized by new results in undefined program behavior. However, it is safe to apply the delete operator to a null pointer.

New and delete are global C++ operators and can be redefined (e.g., if it is desirable to trap every memory allocation). This is useful in debugging, but is not recommended for general programming. More often, the operators new and delete are overridden by providing new and delete operators for a specific class.

When C++ allocates memory for a user-defined class, the new operator for that class is used if it exists, otherwise the global new is used. Most often, programmers define new for certain classes to achieve improved memory management (i.e., reference counting for a class).

OVERLOADING.

One of the many strengths of C++ is the ability to overload functions and operators. By overloading, the same function name or operator symbol can be given several different definitions. The number and types of the arguments supplied to a function or operator tell the compiler which definition to use. Overloading is most often used to provide different definitions for member functions of a class. But overloading can also be used for functions that are not a member of any class.

Suppose we need to search different types of arrays for a certain value. We can provide implementations for searching arrays of integers, floats, and doubles:

int Search (
       const int* data,
       const int key);

int Search (
       const float* data,
       const float key);

int Search (
       const double* data,
       const double key);

The compiler will ensure that the correct function is called based on the types of the arguments passed to Search(). When arguments do not exactly match the formal parameter types, the compiler will perform implicit type conversions (e.g., int to float) in an attempt to find a match.

Overloading is most often used for member functions and operators of classes. Most classes have overloaded constructors, for there is often more than one way to create a given object. All of the built-in types also have operators such as addition, subtraction, multiplication, and division. In fact, we can mix different types and still add them together:

int i = 1;
char c = 'a';
float f = -1.0;
double d = 100.0;
int result = i + c + f + d;

The compiler takes applies the type conversions appropriate for the above calculation. When we define our own types, we can inform the compiler which operations and type conversions can be applied to our type. The compiler will allow our type to blend in with the built-in types. We will see more examples of this when we look at classes in detail.

IMPROVED TYPE SYSTEM.

Through the use of classes, user-defined types may be created, and if properly defined, C++ will behave as if they are one of the built-in types: int, char, float, and double. It is possible to define a Vector type and perform operations such as addition and multiplication just as easily as is done with ints:

// Define some arrays of doubles
double a[3] = { 11, 12, 13 };
double b[3] = { 21, 22, 23 };

// Initialize vectors from the
// double arrays
Vector v1 = a;
Vector v2 = b;

// Add the two matrices.
Vector v3 = v1 + v2;

The Vector class has been defined with all of the appropriate arithmetic operations so that it can be treated as a built-in type. It is even possible to define conversion operators so that we can convert the Vector to a double, we get the magnitude, or norm, of the Vector:

double norm = (double) v3;

DECLARATIONS AS STATEMENTS.

In a C++ program, a declaration can be placed wherever a statement can appear, which can be anywhere within a program block. Any initializations are done each time their declaration statement is executed. Suppose we are searching a linked list for a certain key:

int IsMember (const int key)
{
       int found = 0;
       if (NotEmpty())
       {
               List* ptr = head;
               // Declaration

               while (ptr && !found)
               {
                       int item = ptr->data;
                       // Declaration

                       ptr = ptr->next;
                      
                       if (item == key)
                               found = 1;
               }
       }
       return found;
}

By putting declarations closer to where the variables are used, you write more legible code.

REFERENCES

Unlike C, C++ provides true call-by-reference through the use of reference types. A reference is an alias or a name to an existing object. They are simliar to pointers in that they must be initialized before they can be used. For example, let's declare an integer:

int n = 10;

and then declare a reference to it:

int& r= n;

Now r is an alias for n; both identify the same object and can be used interchangeably. Hence, the assignment

r = - 10;

changes the value of both r and n to -10.

It is important to note that initialization and assignment are completely different for references. A reference must have an initializer. Initialization is an operator that operates only on the reference itself. The initialization

int& r = n;

establishes the correspondence between the reference and the data object that it names. Assignment behaves like we expect an operation to, and operates through the reference on the object referred to. The assignment,

r = -10;

is the same for references as for any other lvalue, and simply assigns a new value to the designated data object.

C programmers know that C uses the call-by-value parameter mechanism. In order to enable functions to modify the values of their parameters, pointers to the parameters must be used as the ``value'', which is passed. For example, a routine Swap(), which swaps its parameters would be written like this in C:

void Swap (int* a, int* b)
{
       int tmp;
       tmp = *a;
       *a = *b;
       *b = tmp;
}

The routine would be invoked like this:

int x = 1;
int y = 2;
Swap (&x, &y);

C programmers are all too familiar with what happens when one of the ampersands is forgotten; the program usually ends with a core dump!

Now consider the C++ version of Swap() which makes use of true call-by-reference.

void Swap (int& a, int& b)
{
       int tmp;
       tmp = a;
       a = b;
       b = tmp;
}

The routine would be invoked like this:

int x = 1;
int y = 2;
Swap (x, y);

The compiler ensures that the parameters of Swap() will be passed by reference. In C, often a run-time error results if the value of a parameter is passed instead of its address. References eliminates these errors and is syntactically more pleasing.

Another use for references is as return types. Consider this routine:

int& FindByIndex (int* theArray,int index)
{
       return theArray[index];
}

Note that the FindByIndex() returns a reference to the element in the array rather than its value. The expression FindByIndex (A, i) yields a reference to the ith element of the array A. Now, because a reference is an lvalue, it can be used on the left hand side of an expression, we can write:

FindByIndex(A, i) = 25;

which will assign 25 to the ith element of the array A.

Note that if FindByIndex() is made inline, the overhead due to the function call is eliminated. Inline functions that return references are attractive for the sake of efficiency.