a Pythonic Ospid
I’m suffering an ospid, I wrote some code last weekend that I keep looking at over and over again because I like it so much.
I’ve got relational database schema that looks like this:

For this blog posting, I am only interested in the first six tables of the top cascade of tables and the ‘updateParamters’ table just below them.
I’m trying to populate this schema with its initial data by walking a filesystem tree. I search for files within the filesystem fetching each file’s pathname. The directories in a pathname correspond to values in the cascading tables.
listOfTables = ['product','version','buildTarget','buildId','locale','channel']
I wrote a function that takes the name of a directory as an argument. The function’s objective is to put the directory name into an appropriate table whenever the value isn’t already there. I could have written the function such that the target table name is also a parameter to the function, but I took a different path instead. I decided that each table should have its own function. This didn’t mean that I had to individually write the function for each table, I could get Python to do that for me.
def getInsertFunctionForTable(tableName, databaseConnection, cache,
insertSqlTemplate = genericInsertSql,
fetchSqlTemplate = genericFetchIdSql):
insertSql = insertSqlTemplate.replace('TABLENAME', tableName)
fetchSql = fetchSqlTemplate.replace('TABLENAME', tableName)
def insertIntoTable(value):
try:
return cache[tableName][value]
except KeyError:
databaseConnection.executeSql(insertSql % value)
id = databaseConndatabaseInsertFunctionsection.singleValueSql(fetchSql % value)
cache[tableName][value] = id
return id
return insertIntoTable
In this code, I define a function that, when given the name of table, will return another function. This second returned function is the one that I defined earlier. If I take my list of table names, and use a list comprehension to create a second list of functions appropriate for handling each of the directories in a pathname.
databaseConnection = ... cache = collections.defaultdict(dict) databaseInsertFunctions = [ getInsertFunctionForTable(x, databaseConnection, cache) for x in listOfTables ]
Now I can take a pathname and my list of functions and use another list comprehension to process them:
pathname = 'firefox/2.00.12/linux-gcc3.1/2008020101/en/somechannel/file.txt'
idForPathname = [x[0](x[1]) for x in zip(databaseInsertFunctions, pathname.split('/'))]
The result is a list of the database’s id for each of the directory names in their respective tables.
As it happens, this is the value that I need to populate the next table in my diagram. Now I can use the same function again for this next table:
updateParametersInsertFunction = getInsertFunctionForTable('updateParameters', databaseConnection, cache,
updateParametersInsertSql, updateParametersFetchIdSql)
Using that idea, I can process entire tree of data, inserting all the values into all the tables with this loop:
for path, name, pathname in cse.FileSystem.findFileGenerator(root,lambda a: a[1] == 'complete.txt' ):
updateParametersId = updateParametersInsertFunction(tuple([x[0](x[1]) for x in zip(databaseInsertFunctions, path.split('/'))]))
I keep looking at this over and over again. I really like it.
The actual software that I wrote was a touch more complicated. I added the capability to translate values in the tables with a reference to a translation function. I also took into account the rest of the tables that I’ve not mentioned in this posting.
February 12th, 2008 at 7:26 am
The masses demand more tasty python tidbits from the master!
very cool.