Archive for February, 2008

a Pythonic Ospid

Monday, February 11th, 2008

I’m suffering an ospid, I wrote some code last weekend that I keep looking at over and over again because I like it so much.

I’ve got relational database schema that looks like this:

For this blog posting, I am only interested in the first six tables of the top cascade of tables and the ‘updateParamters’ table just below them.

I’m trying to populate this schema with its initial data by walking a filesystem tree. I search for files within the filesystem fetching each file’s pathname. The directories in a pathname correspond to values in the cascading tables.

listOfTables = ['product','version','buildTarget','buildId','locale','channel']

I wrote a function that takes the name of a directory as an argument. The function’s objective is to put the directory name into an appropriate table whenever the value isn’t already there. I could have written the function such that the target table name is also a parameter to the function, but I took a different path instead. I decided that each table should have its own function. This didn’t mean that I had to individually write the function for each table, I could get Python to do that for me.

def getInsertFunctionForTable(tableName,  databaseConnection, cache,
                              insertSqlTemplate = genericInsertSql,
                              fetchSqlTemplate = genericFetchIdSql):
  insertSql = insertSqlTemplate.replace('TABLENAME', tableName)
  fetchSql = fetchSqlTemplate.replace('TABLENAME', tableName)
  def insertIntoTable(value):
    try:
      return cache[tableName][value]
    except KeyError:
      databaseConnection.executeSql(insertSql % value)
      id = databaseConndatabaseInsertFunctionsection.singleValueSql(fetchSql % value)
      cache[tableName][value] = id
      return id
  return insertIntoTable

In this code, I define a function that, when given the name of table, will return another function. This second returned function is the one that I defined earlier. If I take my list of table names, and use a list comprehension to create a second list of functions appropriate for handling each of the directories in a pathname.

databaseConnection = ...
cache = collections.defaultdict(dict)
databaseInsertFunctions = [ getInsertFunctionForTable(x, databaseConnection, cache) for x in listOfTables ]

Now I can take a pathname and my list of functions and use another list comprehension to process them:

pathname = 'firefox/2.00.12/linux-gcc3.1/2008020101/en/somechannel/file.txt'
idForPathname = [x[0](x[1]) for x in zip(databaseInsertFunctions, pathname.split('/'))]

The result is a list of the database’s id for each of the directory names in their respective tables.

As it happens, this is the value that I need to populate the next table in my diagram. Now I can use the same function again for this next table:

updateParametersInsertFunction = getInsertFunctionForTable('updateParameters', databaseConnection, cache,
                       updateParametersInsertSql, updateParametersFetchIdSql)

Using that idea, I can process entire tree of data, inserting all the values into all the tables with this loop:

for path, name, pathname in cse.FileSystem.findFileGenerator(root,lambda a: a[1] == 'complete.txt' ):
  updateParametersId = updateParametersInsertFunction(tuple([x[0](x[1]) for x in zip(databaseInsertFunctions, path.split('/'))]))

I keep looking at this over and over again. I really like it.

The actual software that I wrote was a touch more complicated. I added the capability to translate values in the tables with a reference to a translation function. I also took into account the rest of the tables that I’ve not mentioned in this posting.

Old dog, new tricks

Friday, February 1st, 2008

Last Fall, I decided to learn the dvorak keyboard layout.  I didn’t take it very seriously and didn’t do so well.  After yet another bout of wrist and shoulder pain, I splurged and bought one of those expensive Kenesis “two scoops” keyboards with the switchable qwerty/dvorak layouts.  It’s been a month and I’m still really rotten at dvorak.  I use dvorak as the default, and only switch to qwerty when I’m desperate to get something done.

The toughest part are the letters: o, e, i, d, c, r.  I’m always swapping those in pairs: o for e, i for d and c for r.  I run exercises to get them into muscle memory, but they just won’t stick.

This posting has taken an inordinate amount of time to write.