Archive for the ‘Open Source’ Category

it’s a geeky meme

Sunday, April 13th, 2008

lars@bozeman:~$ history|awk ‘{a[$2]++} END{for(i in a){printf “%5d\t%s\n”,a[i],i}}’|sort -rn|head
133 cd
114 ls
44 svn
31 vi
28 python
24 ssh
21 ./ConfigurationManager.py
17 make
13 rsync

It looks to me like I spend too much time moving around the file system. I should try to type more pathnames and stick around in one place…

Sanity Compromised by Firefox and ssh X Forwarding

Thursday, March 6th, 2008

Try this in Linux: open Firefox on your local machine. Then open a terminal window and ssh to another machine using the -X option for X forwarding. On the remote machine, start Firefox. The behavior I get is so bizarre that it cannot be a bug — somehow this looks intentional. The Firefox process on the remote machine sits for a few moments and then dies. Then a new local Firefox window opens. WTF?

I thought I was going insane. The people at the OSL that I told about this thought I was insane. The Mozilla developers that I work with and tried to explain this to thought I was insane.

Some research shows this: the remote Firefox actually starts and communicates with the X server running on the local machine. The X server tells the remote Firefox that there is already a process called Firefox running. The remote Firefox then sends a message to the local one to open a new window and then the remote Firefox dies. This protects a user from creating too many instances of a Firefox process on their machine. Clever, huh? But totally WRONG and counterintuitive!

Apparently you can stop this behavior if you start the remote Firefox with the intuitively named “no-remote” switch. That prevents the remote Firefox from “connecting” to the local Firefox.

Sigh, there goes an hour of my life that I’d like to have back…

a Pythonic Ospid

Monday, February 11th, 2008

I’m suffering an ospid, I wrote some code last weekend that I keep looking at over and over again because I like it so much.

I’ve got relational database schema that looks like this:

For this blog posting, I am only interested in the first six tables of the top cascade of tables and the ‘updateParamters’ table just below them.

I’m trying to populate this schema with its initial data by walking a filesystem tree. I search for files within the filesystem fetching each file’s pathname. The directories in a pathname correspond to values in the cascading tables.

listOfTables = ['product','version','buildTarget','buildId','locale','channel']

I wrote a function that takes the name of a directory as an argument. The function’s objective is to put the directory name into an appropriate table whenever the value isn’t already there. I could have written the function such that the target table name is also a parameter to the function, but I took a different path instead. I decided that each table should have its own function. This didn’t mean that I had to individually write the function for each table, I could get Python to do that for me.

def getInsertFunctionForTable(tableName,  databaseConnection, cache,
                              insertSqlTemplate = genericInsertSql,
                              fetchSqlTemplate = genericFetchIdSql):
  insertSql = insertSqlTemplate.replace('TABLENAME', tableName)
  fetchSql = fetchSqlTemplate.replace('TABLENAME', tableName)
  def insertIntoTable(value):
    try:
      return cache[tableName][value]
    except KeyError:
      databaseConnection.executeSql(insertSql % value)
      id = databaseConndatabaseInsertFunctionsection.singleValueSql(fetchSql % value)
      cache[tableName][value] = id
      return id
  return insertIntoTable

In this code, I define a function that, when given the name of table, will return another function. This second returned function is the one that I defined earlier. If I take my list of table names, and use a list comprehension to create a second list of functions appropriate for handling each of the directories in a pathname.

databaseConnection = ...
cache = collections.defaultdict(dict)
databaseInsertFunctions = [ getInsertFunctionForTable(x, databaseConnection, cache) for x in listOfTables ]

Now I can take a pathname and my list of functions and use another list comprehension to process them:

pathname = 'firefox/2.00.12/linux-gcc3.1/2008020101/en/somechannel/file.txt'
idForPathname = [x[0](x[1]) for x in zip(databaseInsertFunctions, pathname.split('/'))]

The result is a list of the database’s id for each of the directory names in their respective tables.

As it happens, this is the value that I need to populate the next table in my diagram. Now I can use the same function again for this next table:

updateParametersInsertFunction = getInsertFunctionForTable('updateParameters', databaseConnection, cache,
                       updateParametersInsertSql, updateParametersFetchIdSql)

Using that idea, I can process entire tree of data, inserting all the values into all the tables with this loop:

for path, name, pathname in cse.FileSystem.findFileGenerator(root,lambda a: a[1] == 'complete.txt' ):
  updateParametersId = updateParametersInsertFunction(tuple([x[0](x[1]) for x in zip(databaseInsertFunctions, path.split('/'))]))

I keep looking at this over and over again. I really like it.

The actual software that I wrote was a touch more complicated. I added the capability to translate values in the tables with a reference to a translation function. I also took into account the rest of the tables that I’ve not mentioned in this posting.

A Left Shifted Zero is Still a Zero

Thursday, October 11th, 2007

I had to look at the line twice because I couldn’t believe what I was seeing. On the second inspection, it was clear that I would need to look several more times: I just couldn’t actually be seeing what it looked like I was seeing.

#3 0xb7992978 in HXAssertFailedLine (
pszExpression=0xb7a97e60 "((HRESULT) (((unsigned long)(0)<<31) | ((unsigned long)(0)<<16) | ((unsigned long)(0))) ) == pContext->QueryInterface(IID_IHXScheduler, (void**) &m_pScheduler)", pszFileName=0xb7a97e3e "hxpref.cpp", nLine=172) at hxassert.cpp:471

This is a line from a debugger. The code that I’m debugging stopped with an assertion failure on that line. While the code that interested me was further on the in the stack trace, this line was so absurd that I had to investigate it.

The code takes the constant zero, casts it to an unsigned long and then left shifts it 31 bits. It then takes another zero cast to an unsigned long and shifts it 16 bits to the left. These two results are bit-wised or’d together with a third zero cast to an unsigned long. The result, which is 0 as an unsigned long, is then cast to some type called HRESULT. Digging into the nested macro definitions, I see that HRESULT is defined as LONG32. LONG32 is a typedef for FIXED32. FIXED32 is a typedef for INT32. INT32 is a typedef for int. We’re taking a unsigned long zero, and then unceremoniously truncating it to be a signed integer. It’s a miracle, three signed integer zeros have magically become one signed integer zero.

The big question is why? How could code like this get written?

First I must say that any optimizing C++ compiler is going to resolve this absurdity at compile time. The compiler will throw all that crap away and just replace it with a zero. It may take even one step farther and get rid of it, too. So ultimately, it doesn’t matter to the compiler or compromise runtime efficiency. This essay is just an entertaining excursion into what seems like a comic farce.

The original line in the source code looks like this:

HX_VERIFY(HXR_OK == pContext->QueryInterface(IID_IHXScheduler, (void**) &m_pScheduler));

The offending token is the HXR_OK. This is a macro invocation. The macro is defined as:

#define HXR_OK MAKE_HRESULT(0,0,0)

The MAKE_HRESULT macro is defined as:

#define MAKE_HRESULT(sev,fac,code) \
((HRESULT) (((unsigned long)(sev)<<31) | ((unsigned long)(fac)<<16) | \
((unsigned long)(code))) )

This whole absurdity is the result of bit-packing. The coders wanted to take three values that described an error, the sev (severity), fac (facility) and code (error code), and pack them into one integer. In the header file that defines both HXR_OK and MAKE_HRESULT, I can see a number of other constants defined:

#define HXR_ABORT MAKE_HRESULT(1,0,0x4004) // 80004004
#define HXR_FAIL MAKE_HRESULT(1,0,0x4005) // 80004005
#define HXR_ACCESSDENIED MAKE_HRESULT(1,7,0x0005) // 80070005

So in each case, the code defines a constant in such an obfuscated manner that the programmer saw fit to add a comment to each line to show the hexadecimal value that he really intended and hopefully, what the macro eventually expresses.

There are several hundred constants defined in this manner. I see no facility to take these return codes apart to access any of the three values so painstakingly pieced together. Why go to all the trouble to tightly pack an encoding of these three values if you never unpack them? Is simply a case of over engineering?

There is a clue in the same header file in which these abominations are defined. It turns out that the macro definition of MAKE_HRESULT is actually in one branch of a conditional compilation unit. The other branch does not define MAKE_HRESULT. Instead, it includes <winerror.h>. Ah, it all becomes clearer now. This an artifact of compatibility with Microsoft Windows. Microsoft must have defined a system of function return status codes that follows this form. So here we are with an Open Source project to be used on the OLPC, Linux systems, OS/X and numerous mobile platforms bound to a convention required by Microsoft. Is there no escape from this evil octopus?

Looking back to my debugger, I see now that the purpose of the shifted zeros. It just happens to be the degenerate case of what is supposed to be a flexible system to stuff multiple pieces of information into one scalar variable. Someday a future programmer can change how these status codes are assembled simply by changing the definition of the macro. Of course, that will make the comments incorrect. I will wager that the refactoring day will never come and this piece of flexible engineering will never get the chance to flex.

No matter how much I wash, my hands are unclean

Monday, July 9th, 2007

Here’s a C++ issue that’s been stumping me all morning. I’ve got a compiler error in a header file that states that a particular symbol is not declared within the current scope. I’ve been able to figure out is that the compiler is right - I certainly cannot find any way that this symbol is available anywhere that any compiler would be able to find it. A mitigating factor is that the missing symbol is within the declaration of a template.

Grepping across all files in the project, I can find the symbol repeated in many files as a local static constant.

static const char kValueDescription[] = "Writes Null Files";

This same definition appears in many files, each file has its own unique string. So these definitions of my missing symbol are available only after the inclusion of my header file. Just to test, I moved the include of the header from the top of the cpp file to a point below the alleged definition of my missing symbol: the missing symbol error goes away. It just seems plain wrong to require something to be defined prior to inclusion of a header file. Header files should be self contained: it should forward declare classes it can’t know details about or include other headers to get the declarations that it needs.

So here’s what I think is going on: the code is banking on the assumption that the compiler will blindly treat the template as if it were a macro. The original author wanted the compiler to not process the template until it is actually instantiated (coincidentally immediately after the the declaration/definition of the missing symbol). In my case, the compiler is not cooperating and as far as the source file is concerned, compiling the template prematurely.

I know that the GNU C++ compiler has had a bug in the past that where templates were instantiated prematurely. I’m not yet sure if this is an instance of this problem. I’ve seen the premature template instantiation problem manifest as an unlawful symbol redefinition, not a missing symbol. The question is, should a compiler fully check a template prior to instantiation? I would say “yes”, unfortunately, this code says, “no.”

I need to find a solution. Either I have to make the new location of the include permanent, an option I find distasteful, or I need to find a way to forward declare the static const char array in the header file. Unfortunately, forward declarations are for classes, not data items. How do you forward declare data?

So I am forced to be unclean. I’ve moved the header inclusion from the top of the file where it belongs down into the middle of the source file. This makes me ill with foreboding that there will be more trouble in the future from this.

The return of the language lawyer

Thursday, July 5th, 2007

Now that the nursery work is virtually complete for me, I can reallocate that part of my brain to my real career. Today, I reconnected with a part of my past: I woke the C++ language lawyer in me. I haven’t seen him in ten years.

I have this big pile of C++ code in front of me representing a development project from the late nineties. It is a wild mixture of coding styles embracing complex preprocessor systems for class declarations simulating templates, real templates, multiple inheritance, private inheritance, a roll-it-your-own runtime type identification system, all wrapped around a chewy Microsoft COM API. There are several hundred classes in several hundred files riddled with conditional compilation directives switching on everything from Win16 to Solaris. Few declarations are not wrapped in defines while most types are typedef’d or macro’d within an inch of their lives. Oh yeah, it’s got some threading, too.

While a branch of this code allegedly compiles on Linux under GNU 3.x, I am trying to use version 4.1.x to compile it. I am told that the 4.x GNU compilers are significantly pickier than the version 3x compilers. The actual code hasn’t been touched in at least four years. Honestly, it looks as if the team of programmers assigned to this code were unexpectedly escorted from the building during a mass layoff. Several files look to be half way through a refactoring effort. I find undeclared variables, misspelled enumerations, missing and ambiguous scoping, unused parameters and many other problems.

My task is to make it all compile because it must be drafted into use. I’m all for recycling and have embraced the object oriented code reuse credo since 1988, but I am taken aback by the complexity of this task. It may be that the original coders were too clever several times over. There are some nether regions of the C++ standard that are terrifyingly beautiful with fractal complexity: but I would think twice about using them in production code. I must say that the coder’s intent is rendered rather opaque by the language.

I’ve been here before. However, I was on the other side: I have written opaque code while enthralled with the brilliant yet twisted beauty of the underlying structure. I wrote it during the same era that this code originates. No documentation would ever be needed for it, it’s so obvious, “a child could do it”.

With age comes a modicum of wisdom. I know where these people and, indeed, I myself, have gone wrong in the past. Code must be written knowing that the high priest will pass on. It is time for me to pay for the follies of my youth. I walk to my task of atonement willingly and with my eyes open. When my delayed penance is served and I am free once more, I will return to Python coding with an eye for those who will come after me.

Lost History?

Thursday, September 29th, 2005

Several years ago, when helping clean out my father’s house before it was sold, I ran across an old trunk. Inside, I found hundreds of papers: the notes and articles written by my great uncle Larry Rue, a World War II war correspondent in Europe. The papers covered roughly thirty years of his career and included a personal letter from Herbert Hoover. I spent hours reading through those fascinating papers. Aside from some messy handwriting, nothing about these papers had become obsolete or unreadable.

I wish I could say the same for my own “papers.” It was 1984 when I first put a word processor on a computer. I started writing volumes. Every letter, every journal entry and every note ended up with a dot doc extension. As the years passed, I upgraded and moved my documents to new machines. By 1990, when in graduate school, I was using Word for Windows under Windows 386 (a nearly forgotten precursor to Windows 3.0). At that point, I discovered that I was no longer able to read the files that I had written only five years earlier. Another fifteen years has passed and now I can’t read my own graduate thesis document.

As their products evolved, Microsoft made conscious decisions to drop support for older document formats. The decisions were made for business or technical reasons for their own benefit and convenience, not mine. Future files formats will allegedly be patented, insuring that no matter what software reads these documents in the future, Microsoft’s corporate palm must at some time be crossed with silver.

By endorsing OpenDocument v. 1.0, the Commonwealth of Massachusetts has mandated that all government documents must be in an open format. Apparently, a state government has noticed that important documents from the past aren’t readable any more. Apparently, they also noticed that a single company could potentially put a toll booth on the discourse between citizens and their government. This is a great decision. Not only does it open the possibility that historians and archivers in the future will have access to the information, but it levels the playing field for different vendors right now. Anyone could write a document processor to read these documents. For the first time in years, perhaps we can see some real competition in the field of office document programs.

As a citizen, shouldn’t I be entitled to access the information and documents generated by my government? Isn’t this a rather basic right? No corporation should have to be paid in order for me to exercise my rights.

In the realm of my own life, I find it repugnant that a corporate toll booth could sit between me an my own journal and correspondence. I’m certainly glad nothing stood between me and my Uncle Larry. Perhaps it is just vanity, but I’d like to think that my painstakingly transcribed thoughts will someday be a found treasure. But as it stands right now, it just a bunch uselessly cryptic but beautifully organized binary files.

How I learned to stop worrying and love OSS

Thursday, July 21st, 2005

I am a software developer. I’ve had a long and fruitful career both as a corporate employee and an independent consultant. In 1999, I took a left turn with my career and opened a nursery specializing in rose bushes. I took a couple of older computers discarded by my software consulting business and dedicated them to the new business. Using Microsoft Access, Visual Studio C++ and libraries licensed from Rogue Wave Software, I created the software to control the entire business: inventory management, customer management, order fulfillment, web site generation and accounting. Most of my waking hours were dedicated to the evolution of this all encompassing small business software package.

In 2001, as my business was growing rapidly, I got a letter from Microsoft and the Business Software Alliance (BSA). I was told that there was pirated software in use in my business and I was potentially liable for thousands of dollars in damages to Microsoft. Reading on, I was informed that the BSA has the right to audit my business at anytime and if they found illegally copied software the consequences would be dire for my business.

I was dumbfounded. I began an audit of my own operations. I realized I was missing some paperwork. The machine that I used with Microsoft Access was purchased in 1997 by my software consulting business. While I had the original receipt for the machine that showed that I had purchased the Small Business package that included Access, I could not find the Certificate of Authenticity for MS Office.

I went back to the letter from the BSA. According to them, there was some good news: an amnesty program. I could get legitimate licenses for my software at discounted rates directly from Microsoft. It would be so convenient, they provided an 800 number as well as a Web site. They suggested that I take advantage of their offer immediately, because the BSA is auditing thousands of businesses every year.

I considered biting their hook. But some research showed that Access had changed significantly between the 97 and 2000 versions. I would have to re-implement the front end of my software to make it work with Access 2000. The price for the new software coupled with the lost time and expense of re-engineering the front end was too much. In the midst of the nursery’s busy shipping season, I had no time to dedicate anything but order fulfillment, inventory care and keeping the irrigation system running.

I remember sitting at my desk one evening pondering how to resolve this problem. Then I noticed the envelope that the original letter had come in: it was addressed to “Ms. Rose Uncommon”. My business is called “The Uncommon Rose”. There is no one here named Ms. Rose Uncommon. We get junk solicitation mail for this non-existent person all the time. It was suddenly clear that they had purchased a bulk mailing list and carpet bombed everyone on it. Microsoft had adopted a policy of marketing by blackmail: “Buy our products or we’ll sue you!”

My business is my livelihood: it puts food on my table. It was time to swat Microsoft’s hand out of my revenue stream.

I stopped worrying about it until the end of the shipping season that year. I acquired a couple no-name computers and loaded them with Red Hat Linux. I learned about PostgreSQL, the Python language and Open Office. I cobbled together a replacement for all Microsoft products before the beginning of the next shipping season.

It’s now three years later and the business runs entirely on computers running Linux. OSS enables the Uncommon Rose to thrive without the yoke a mega-corporation whose agenda I neither understand nor trust. I’m no longer so intimately involved in the day to day operation of the nursery: I have employees for that. With my position at the OSUOSL, I have a great opportunity to contribute back to the community that helped me so much.