In my previous post on this topic, I said that I was going to take one of the most humdrum and aggravating tasks a postmodern serf can do – buy a (used) car – and to use coding and databases to optimize it.
The problem when dealing with any organization is the lopsided advantage in information arbitrage the org has. Dealing with management at work? They have the information and the experts to deploy it. Dealing with government bureaucracy? Same, or more so. Even when performing a consumer activity, the business generally has advantage over the private consumer. Any supermarket chain has troves of information and skilled tech workers for analysis. It’s no surprise, therefore, that the best consumer apps and services are those that redress that imbalance in favor of the private consumer, like Zillow in real estate, or Kelly Blue Book for car-buying.
The area of the world in which I live being woefully underserved in useful consumer apps, I decided to use some of the time the epidemic gave me and I gathered information from various sources which I then shoehorned into a database. Truthfully, it’s a kind of ad-hoc and clunky series of kludges requiring some manual intervention at each stage.
My choice of technologies for any project is usually learning-driven. Since Python and SQL were the tools I wanted to keep honing, I used them for this purpose. They have proved adequate and reliable.
My information sources were of two potential kinds: I prefer openly published creative commons data (usually downloadable as csv files), and there are others that are not so open. Web scraping is a whole kettle of fish I prefer not to go into. I did not deploy any automated scraper or spider, due to Firefox having great extensions that make manual gathering of data much easier and faster (such as SingleFile and SnapLinks).
Html files are parseable using Beautiful Soup:
soup = BeautifulSoup(open(the_file, encoding='utf8'), "html.parser")
and convertible into csv files using regular Python by importing libraries like csv and then using DictWriter:
import csv
#...lots of other code here...
with open(the_filename, 'w', encoding='utf-8') as f:
w = csv.DictWriter(f, dictlist[0].keys())
w.writeheader()
w.writerows(dictlist)
#...lots of other code here...
These csv files can be imported into SQLite using .import.
If and when I want to put a front-end onto the data, I’ll consider either Django or React, depending on the direction I want to take down the road. For I now I query the database directly on the command line, e.g.:
sqlite> select * from car_model where lower(model_name) like '%corolla%';
The one stage I do however strongly suggest is following Philip Greenspun’s advice regarding data modeling, and invest time in understanding what one’s building. To do so I made several sql files runnable in sqlite using the .read command. They are idempotent, so I can re-run them whenever I want to rebuild the database.
Sqlite is very easy to move around. Just copy and paste the db file! So I transferred it to a small cheap android device which I can stick in my bag and take wherever I want to look at cars, like used-car lots. I can do on-the-spot command line queries , inputting any license plate or model, and get back info about that vehicle.
To run sqlite on android:
Install termux from the play store, (give it permissions to read and write files!), then install sqlite:
pkg install sqlite
Then change to the directory into which you copied your db file, and run sqlite:
$ sqlite3 vehicles.db
This will give the sqlite3 prompt for querying the database.
Now, If I was to create a public-facing app, I would probably use Postgresql. However, as the idea is to keep things as simple as possible, sqlite is perfect for this niche application.
Equipped with my new app, I am going to do some fieldwork. Wish me luck!
By the way, what do you think I should emphasize in my data munching of models, trims, transmissions, and so on? What would be the most helpful in making sense of the many variables involved? You can leave a comment below.