Archive for June, 2007

Seeking Code That Might Not Exist

Monday, June 18th, 2007

I am a master of poorly defined goals. In this project, I am to locate some specific functionality within the Helix Producer’s massive pile of source code. The specific functionality, though, is not well defined. At a meeting last week, I was told to look for code that actually captures video.

So I thought to myself, “how does a program communicate with a device?” The answer is, of course, through device drivers. How do you talk to device drivers? My shaky memory dredges up the word “ioctl”. I grep for “ioctl” in the ProducerSDk code. It appears only once in code dedicated to output filtering. That looked like a dead end.

The Helix Producer SDK Developer’s Guide on page 37 states: “Capture devices are plug-ins that wrap operating-specific capture subsystems,
such as DirectShow or Video4Linux.” This states directly that the ProducerSDK delegates capturing to an underlying system. DirectShow is a Microsoft thing, so I can disregard it. Video4Linux (V4L) is my target.

Somewhere in the ProducerSDK there is code that interacts with V4L - it should be simple enough to find. I downloaded an example of C code that uses V4L (specifically: capture.c). Looking inside, I see lots of calls to “ioctl” to interact with the driver. I see no code in ProducerSDK that looks like this code.

Hmmm, I’m not sure what this means. I wonder if I have the right version of the ProducerSDK.

Is This Thread Safe?

Monday, June 11th, 2007

In my quest to understand the structure and functioning of the Helix Producer SDK code, I am looking at it one tiny piece at a time. Since it is our goal to extract useful code from Helix Producer to include in our project, someone needs to understand the dependencies that we might encounter. Today I started examining the system of smart pointers.

Digression - what is a smart pointer ?

Smart pointers, sometimes referred to as reference counted pointers, are a memory management technique used by C++ programmers. Unlike Java, C# or Python, unused memory is not automatically garbage collected in C++. By adding a layer of indirection, an instance of a special pointer class wraps a traditional C style pointer. This class instance has special copy and assignment constructor semantics: when it is copied, a counter on the thing it is pointing to is incremented. When one of these pointer wrappers is destructed, the counter is decremented. If the counter goes to zero, then it is known that the thing pointed to is no longer referenced by anything and can itself be deallocated.

Smart pointers must be very careful in multi-threaded environments. The act of copying itself must be atomic - very atomic. How does a smart pointer copy itself? Using a copy constructor, it first allocates a memory for a new instance of the smart pointer class and then initializes the instance variables. The only instance variable is the pointer to the thing of interest. Once the pointer is copied, then it increments the reference counter in the pointed-to object. What would happen if the original smart pointer’s destructor were called in the instant between the copy of the pointer and incrementing the reference counter? It is possible that the referenced object could be destructed before the copy gets the chance to increment the reference counter.

Helix Producer Smart Pointers Appear to be Vulnerable

Check out this the copy constructor from Helix Producer’s Smart Pointer Code in producersdk/common/include/hxtsmartpointer.h:

00113 template<class T>
00114 inline CHXTSmartPtr<T>::CHXTSmartPtr( const CHXTSmartPtr<T> &spCopy ) :
00115         m_ptr( spCopy.m_ptr )
00116 {
00117         if ( m_ptr )
00118         {
00119                 m_ptr->AddRef();
00120         }
00121 }

The actual pointer is copied in the initializer on line 00115. Then, in the body of the copy constructor, if the pointer is not zero, the referenced object’s counter is incremented. The member function “AddRef” is implemented in producersdk/common/include/hxtunknown.h as:

00084         STDMETHOD_(ULONG32, AddRef) (void)
00085         {
00086                 return InterlockedIncrement( &m_lCount );
00087         }

A little spelunking reveals that “InterlockedIncrement” is an atomic operation defined in a Microsoft API. This function call is properly protected in a multi-threaded environment.

As I stated above, though, I believe that both the copy of the referenced pointer and the increment must be protected. How is data shared between threads? It doesn’t seem to matter which thread invokes the copy constructor, the code is vulnerable.

Scenario 1: If a reference to a smart pointer is given to a thread and that thread invokes the copy constructor to make its own copy, then it is vulnerable to accessing deallocated memory. The original smart pointer could have been destructed between the time of the internal pointer’s assignment and incrementing the reference counter.

Scenario 2: If a master thread makes its own copy of the smart pointer object for use by a child thread, then the master thread must explicitly take care to not allow the copy go out of scope while the thread still lives. If the master thread lets the smart pointer destructor be invoked, then the thread has an invalid copy. If the child thread were to make its own copy, then we’re right back to scenario 1.

Does This Really Happen in Helix Producer?

I do not know. I will need to study how the threading model works in our target application. Perhaps we’ll be lucky and referenced counted objects are never shared between threads, but I doubt it.

Further problems with CHXTSmartPtr

CHXTSmartPtr’s assignment operator fails to take into account the tautological “self assignment“.

CHXTSmartPtr<SomeCHTXUnknownDerivative> p(new SomeCHTXUnknownDerivative);
CHXTSmartPtr<SomeCHTXUnknownDerivative>& q(p);
p = q;  //deallocated memory referenced

Granted, code such as direct as this would not be written. However, it is not unfathomable for the provenance of the R-value to be unknown. In such a situation, a self assignment could be completely inadvertent but ultimately disasterous.

 

Doxygen Rocks

Friday, June 8th, 2007

Consider the enormity of the problem. I’ve got a pile essentially uncommented C++ code from the Helix Producer code in front of me. I need to understand its structure and purpose. I start by whipping up a Python program to ferret out the class hierarchy. It works pretty well, only getting confused by some of the more complicated class declarations that are hidden in preprocessor directives. The output of the program is rough html - no clever formatting, just page after page outlining three thousand one hundred six classes. Ugh.

I decided that I needed to see some class hierarchy diagrams. Of course, I jump to GraphViz to try to remember how the “dot” language works. On that Web site, I encounter something interesting: Doxygen a documentation tool for several programming languages. I look at it an see lots of references to specially formated comments and then make the faulty assumption that it works only if the code’s comments follow the special format. I decide to drop further investigation.

However, a code sample in Doxygen’s documentation catches my eye. There is an example of a C++ with a class name identical to a class name in my pile of Helix code. Digging a little deeper, I find several other classes that match, too. Is Doxygen actually using this same code that I got as their examples? Well, of course not. I used Google to search for those class names on the internet in general. I find lots of references to Microsoft Web sites. It turns out that Helix producer is pretty tightly coupled to a Microsoft API. I had no idea. Some code is reproduced literally. Is this a copyright problem for this Open Source project? I’m assured by others, that it is not.

Back to Doxygen: I dropped it until our overlords at our funding institution mentioned it. I was told I should revisit Doxygen. After the meeting, I came back to my machine and did:

sudo apt-get install doxygen
sudo apt-get install doxygen-gui

Within ten minutes, I had fantastic documentation for the important part of our project. It turns out that Doxygen does need the specially formatted comments to produce this kind of documentation. It actually parses the source code to analyze the structures. It even then uses GraphViz to create the hierarchy diagram.

Without I doubt, I find Doxygen to be one of the coolest tools available to a programmer.