Category: DataIO
-
DataIO
A few weeks ago my employer helped the NY State Senate parse the MTA budget information into a machine searchable format. (The MTA originally published the budget as a PDF.) To parse the PDF I used a utility called pdftohtml to first convert the PDL into an XML document. I then used the python library…