Tagged: python

DataIO

A few weeks ago my employer helped the NY State Senate parse the MTA budget information into a machine searchable format. (The MTA originally published the budget as a PDF.) To parse the PDF I used a utility called pdftohtml to first convert the PDL into an XML document. I then used the python library lxml to convert the document into a set of csv files. The results of this labor can be seen on TOPP’s data site.

Soon after I published this data, however, I was told by a number of people that the data would be more useful if presented in another format. At first I just started creating a bunch of command line python scripts that would suck in these csv files, and spit them out in different formats. I quickly realized that I could accumulate these scripts and create a quick and dirty web application.

Over a few train rides I created an application called DataIO, and today, I finally got a chance to upload it to Google App Engine. The application is pretty simple to interact with; instructions are located on its front page.

Continue reading

Exporting Posts from WordPress 2.5

Recently I tried exporting some posts from one WordPress 2.5 blog to another WordPress 2.5 blog. The import worked perfectly, but then I realized that I had forgotten to import one of the posts. Rather than doing to the whole import again, I decided to just import the post that I had forgotten. Unfortunately, WordPress brought over all of my categories again, leaving me with duplicates of all of my categories. Ugg.

I wrote a small python script that takes a wordpress export file and strips out information about categories and tags. It’s a pretty simple script, but I will share it anyway: script