Semantics and Scrapers (Flatiron - Day 007)

| Comments

We covered a lot of material today, from the value of semantic mark-up in HTML to how to build web scrapers in Ruby. The day started with a review of our homework from the previous night, which was to recreate a song cataloging program that we had written using hash iteration. Using hashes rather than arrays to organize our song list enabled us to utilize metadata about the artist, album and song title that are accessible via hash keys.

Later on in the day we went to work on our student profiles, and Avi taught us about the value of semantic markup along the way and other front-end best practices. Applying proper markup to an HTML document enables search engines to more effectively place content and generally aligns with the spirit of the Internet in writing web pages that are aware of the structure of their own content. In creating our student profiles, it was tempting to take semantic shortcuts and place content in the incorrect enclosing tags (e.g., list data in a table, which has a preset formatting). Avi emphasized, however, that the separation of content and style is a fundamental underpinning of the Web and should be respected. Content that is properly structured via HTML can be styled in an infinite number of ways, so we shouldn’t let our interest in maintaining beautiful design affect how we semantically define our content. A great example of this is the CSS Zen Garden project. Take a look at a few of the projects - you’ll realize that wildly different website designs can be based on the exact same HTML content.

In the afternoon, we were tasked with building our very first web scraper, on the student profile page we had just created, in fact. Using Nokogiri, a Ruby gem, I was able to write a program that parsed the individual student profile pages from the central index page and return a hash of attributes, covering each student’s work history, education, favorite apps, etc. For those who don’t know, scrapers are pretty awesome. They allow you to collect and organize data in any way you want from web sites, so long as you know what you’re looking for. Using scrapers, I could quickly collect sports data, or find all fitness classes at my gym, or pull out information from Wikipedia. Scrapers essentially allow you to extract large amounts of data for your own use. This is pretty powerful, and I’m excited to try and build more complex scrapers, and even attach those to analytical engines to provide useful intelligence about a topic.

In the evening, I made it over to an event on civic technology and public hacking with a fellow student. The talk focused on the state of civic technology, and included short speeches by politicians, city officials and active members of the civic tech community. Although it was all very interesting and exciting to know that this community is filled with passionate people who care about solving problems, I was exhausted after a long day at Flatiron, so I high-tailed it out of there when I got a chance and went back home to keep working on my scraper and watch the State of the Union.