<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>anilmakhijani.com &#187; lxml</title>
	<atom:link href="http://anilmakhijani.com/tag/lxml/feed/" rel="self" type="application/rss+xml" />
	<link>http://anilmakhijani.com</link>
	<description></description>
	<lastBuildDate>Mon, 06 Jun 2011 17:19:22 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>DataIO</title>
		<link>http://anilmakhijani.com/2009/04/12/dataio/</link>
		<comments>http://anilmakhijani.com/2009/04/12/dataio/#comments</comments>
		<pubDate>Sun, 12 Apr 2009 19:26:18 +0000</pubDate>
		<dc:creator>Anil</dc:creator>
				<category><![CDATA[DataIO]]></category>
		<category><![CDATA[Google App Engine]]></category>
		<category><![CDATA[lxml]]></category>
		<category><![CDATA[MTA]]></category>
		<category><![CDATA[planetdev]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://anilmakhijani.com/?p=244</guid>
		<description><![CDATA[A few weeks ago my employer helped the NY State Senate parse the MTA budget information into a machine searchable format. (The MTA originally published the budget as a PDF.) To parse the PDF I used a utility called pdftohtml to first convert the PDL into an XML document. I then used the python library [...]]]></description>
			<content:encoded><![CDATA[<p>A few weeks ago <a href="http://theopenplanningproject.org">my employer</a> helped the NY State Senate parse the MTA budget information into a machine searchable format.  (The MTA originally published the budget as a PDF.)  To parse the PDF I used a utility called <a href="http://pdftohtml.sourceforge.net/">pdftohtml</a> to first convert the PDL into an XML document.  I then used the python library <a href="http://codespeak.net/lxml/">lxml</a> to convert the document into a set of csv files.  The results of this labor can be seen on <a href="http://data.topplabs.org"> TOPP&#8217;s data site</a>.</p>
<p>Soon after I published this data, however, I was told by a number of people that the data would be more useful if presented in another format.  At first I just started creating a bunch of command line python scripts that would suck in these csv files, and spit them out in different formats.  I quickly realized that I could accumulate these scripts and create a quick and dirty web application.</p>
<p>Over a few train rides I created an application called <a href="http://www.dataio.org">DataIO</a>, and today, I finally got a chance to upload it to Google App Engine.  The application is pretty simple to interact with; instructions are located on its <a href="http://www.dataio.org">front page</a>.</p>
<p><span id="more-244"></span></p>
<p>Currently the application can only transpose data and multiply the data set by a given factor.  I hope to soon add a jsonp api that will make it trivial to convert a given data set into a format that plays nice with <a href="http://code.google.com/apis/chart/">google charts</a> and <a href="http://code.google.com/p/flot/">flot</a>.</p>
<p>The code for this application is hosted at <a href="http://bitbucket.org/anil/dataio/">bitbucket</a>.</p>
<p>Just for fun, here is some data from data.topplabs.org, sent through dataIO.</p>
<p><strong>Operating Revenue (transposed and multiplied by 1000000):</strong></p>
<p>http://www.dataio.org/data/f?transpose=true&#038;multiplication_factor=1000000&#038;multiplication_start_row=1</p>
<p><strong>Total Receipts by Agency (transposed and returned in json):</strong></p>
<p>http://www.dataio.org/data/Bt?format=json&#038;transpose=true</p>
<p><strong>Bridges and Tunnels Summary of Total Budgeted Debt Service (multiplied by 100 and returned in csv):</strong></p>
<p>http://www.dataio.org/data/IM?multiplication_factor=100&#038;multiplication_start_row=1&#038;format=csv</p>
]]></content:encoded>
			<wfw:commentRss>http://anilmakhijani.com/2009/04/12/dataio/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

