Flatiron - Day 004

| Comments

This morning, we started with a review of the SQL homework and ventured into the wide world of SQL joins, followed by the start of our Ruby learning in the afternoon. Regarding SQL, I struggled a bit with the homework last night, but after I read up a bit more on joins, the confusion lessened and I was able to create a working quiz database. Our task was to (i) design a database schema, (ii) populate the database, (iii) query the database and (iv) determine quiz scores. I’ve published the SQL code I wrote to accomplish this task after the jump.

After a lunch of Chinese food delivered to the school, we delved into Ruby. The “Programmer of the Day” was, aptly enough, Yukihiro Matsumoto, or “Matz”, who invented the Ruby language. Avi described how Matz designed Ruby to be pleasing to the people using the language, not for the computer. The ultimate reason for creating Ruby was simply because it made Matz happy - not because of efficiency, power or universality, but rather human happiness. Ruby is a language designed to match the differences and idiosycracies inherent in human behavior and interaction. As Avi put it, our differences should be celebrated, not suppressed, and as a philosophical basis, Ruby enables different programmers to express themselves differently - there’s no one right way of doing something.

I enjoy Avi’s occasional expositions on the philosophical underpinnnings of code. I had never before thought about coding as an expressive and creative medium, on par with other art forms like music or painting, but the more I think about it, the more I can appreciate how code empowers people to exercise creativity. Even simple functions and programs can be written in any number of different ways. The mere variety of how much can be done using code is a testament to the infinite possibilities available with code.

For the rest of the afternoon, Avi lectured about Ruby and we broke off on occasion to work on practice problems. We covered variables, methods and basic control structures. All of the stuff we covered today was part of the class pre-work, but it was good to solidify my understanding, and I realized that there a few quirks of Ruby that I would have taken for granted if we had just skipped over the basic foundations of Ruby. For example, did you know that if you run a command like:

1
true || x = 3

And then try to print the value of x, you won’t be returned 3? That’s because Ruby, in evaluating the expression, first saw that the left side of the “OR” statement (indicated by ||) was true and so it automatically knew that the entire expression was true (at least one side was true). Thus, Ruby didn’t even assign the number 3 to the variable x. I hadn’t considered that behavior for Ruby before, so it was good to go through some of the more nuanced aspects of the language.

Learning Ruby took up most of the rest of the day. Tonight, we have to write a few basic programs in Ruby (FizzBuzz!) just to get more acquainted with the language.


SQL Exercise

This SQL code serves as the database schema and creates five tables - ‘users’, ‘quizzes’, ‘questions’, ‘choices’ and ‘answers’. Each quiz has multiple questions, each of which have multiple choices. Users take quizzes and select choices as their answers, which then populates the ‘answers’ table.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- CREATES DATABASE SCHEMA FOR QUIZ.DB

CREATE TABLE users (
  id integer primary key autoincrement,
  fname varchar(20) NOT NULL,
  lname varchar(50) NOT NULL
);

CREATE TABLE quizzes (
  id integer primary key autoincrement,
  name varchar(100) NOT NULL
);

CREATE TABLE questions (
  id integer primary key autoincrement,
  description varchar(200) NOT NULL,
  quiz_id integer,
  FOREIGN KEY(quiz_id) REFERENCES quizzes(quiz_id)
);

CREATE TABLE choices (
  id integer primary key autoincrement,
  content varchar(200),
  correct integer,
  question_id integer,
  FOREIGN KEY(question_id) REFERENCES questions(question_id)
);

CREATE TABLE answers (
  id integer primary key autoincrement,
  user_id integer,
  question_id integer,
  choice_id integer,
  FOREIGN KEY(user_id) REFERENCES users(user_id),
  FOREIGN KEY(question_id) REFERENCES questions(question_id),
  FOREIGN KEY(choice_id) REFERENCES choices(choice_id)
);

In creating the database, Avi made a few good points about efficient nomenclature. When I first created my database, I had redundant field names for my tables (e.g., “id” in the “choices” table was labeled “choice_id”). The prefix of “choice” for “id” is redundant, since the inclusion of the field in the “choices” table automatically associates “id” with “choices” and differentiates it from the “id” field in the other tables. Its not a huge deal to include the table prefix in the field name, but its unnecessary and the elimination of the prefix is an example of DRY (Don’t Repeat Yourself).

After creating the schema, we had to write SQL that would populate the database. The below SQL script adds data into the “users”, “quizzes”, “choices” and “questions” tables. As you’ll notice, there are 2 quizzes, each with 5 questions, each with 4 choices, for a total of 40 choices (2 * 5 * 4), hence the number of rows in the “choices” table. I used a “multi-insert” statement here rather than having INSERT INTO … VALUES surrounding each of my entries. This saves time and makes the code look cleaner. Also, I added the corresponding field names after the table name in each of the insert blocks - I don’t think this is necessary, as SQL will read the entries and assume that the first value goes into the first column, labeled “id”, and same with the other columns, but I figure that its best practice to be explicit with the field names so that there’s no ambiguity (for example, if you add fields in the future).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- INSERTS INITIAL VALUES INTO QUIZ.DB

INSERT INTO users (id, fname, lname) VALUES
  (1, "Derrick", "Rose"),
  (2, "Kevin", "Durant"),
  (3, "Lebron", "James");

INSERT INTO quizzes (id, name) VALUES
  (1, "Ruby on Rails"),
  (2, "SQLite3");

INSERT INTO questions (id, description, quiz_id) VALUES
  (1, "What is Ruby on Rails?", 1),
  (2, "Who created Ruby on Rails?", 1),
  (3, "Why is Ruby on Rails so awesome?", 1),
  (4, "Who uses Ruby on Rails?", 1),
  (5, "What are some Ruby on Rails competitors?", 1),
  (6, "What is SQLite3?", 2),
  (7, "Why is SQLite3 better or worse than other databases?", 2),
  (8, "How does SQLite3 vary from MySQL?", 2),
  (9, "Is MongoDB better than SQLite3?", 2),
  (10, "Why should you not use SQLite3 for large applications?", 2);

INSERT INTO choices (id, content, correct, question_id) VALUES
  (1, "Choice A", 1, 1),
  (2, "Choice B", 0, 1),
  (3, "Choice C", 0, 1),
  ...
  (40, "Choice D", 0, 10);

This SQL command populates the “answers” table with data. I purposefully populated the data such that Derrick Rose got all questions right (he’s the best of the 3, after all), Kevin Durant got 3 questions right in each quiz and Lebron got only 1 question right per quiz (I’m still bitter over the 2011 Eastern Conference Finals).

1
2
3
4
5
6
7
8
9
  -- INSERTS RESULTS OF 3 USERS EACH TAKING 2 TESTS 
  -- WITH 5 QUESTIONS INTO THE ANSWERS TABLE OF QUIZ.DB

INSERT INTO answers (id, user_id, question_id, choice_id) VALUES
  (1, 1, 1, 1),
  (2, 1, 2, 5),
  (3, 1, 3, 9),
  ...
  (30, 3, 10, 39);

This SQL command analyzes the resulting answers. In designing this query, I wanted to get back a table of six rows - one for each user-quiz pairing for 3 users, each taking 2 quizzes - with the user name, quiz name and total correct answers. I started the “answers” table, as I figured that “answers” would be my primary source for the scores; how could you determine a correct score without knowing the quiz takers’ answers?. From there, I joined the table against other tables that included useful information.

The “answers” table includes what each user selected as a choice for a given question, but not whether that choice is correct (including such data in this table would be de-normalizing the “correct answer” data, which lives in the “choices” table), so I joined the “answers” table against “choices” to determine which choices selected by users were correct. Then, I continued joining against “questions”, “users” and “quizzes” to get the necessary information. Finally, I grouped the results by user_id and quiz_id, which collapses the return table into the 6 row table for each user-quiz pair that I was looking for.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- CREATES A TABLE FOR EACH USER-QUIZ SET AND 
-- RETURNS THE SCORE EACH USER GOT ON A GIVEN QUIZ

SELECT
  u.fname AS 'first name',
  u.lname AS 'last name',
  u.id AS 'user id',
  qz.name AS 'quiz name',
  SUM(correct) AS 'raw score'
FROM ((((
  answers a INNER JOIN choices c ON a.choice_id = c.id)
  INNER JOIN questions qu ON a.question_id = qu.id)
  INNER JOIN users u ON a.user_id = u.id)
  INNER JOIN quizzes qz ON qu.quiz_id = qz.id)
GROUP BY a.user_id, qu.quiz_id;

Finally, I wrote some SQL to do some basic analytics on database. This basically is very similar to the above query, but adds the percentage score.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
-- CALCULATES PERCENTAGE CORRECT FOR A USER-QUIZ SET

SELECT
  u.fname AS 'first name',
  u.lname AS 'last name',
  qz.name AS 'quiz name',
  SUM(correct) AS 'raw score',
  SUM(correct) * 100 / 5 || '%' AS 'score percentage'
FROM ((((
  answers a INNER JOIN choices c ON a.choice_id = c.id)
  INNER JOIN questions qu ON a.question_id = qu.id)
  INNER JOIN users u ON a.user_id = u.id)
  INNER JOIN quizzes qz ON qu.quiz_id = qz.id)
GROUP BY a.user_id, qu.quiz_id;

Avi said that we wouldn’t be actively writing many SQL queries, since Rails abstracts so much of the database interaction. However, there will undoubtedly be times when Rails isn’t sufficient for complex data queries, and we’ll have to write our own SQL. Even if we don’t use SQL that often, its critical to know, and I think its pretty cool, despite the at times frustrating syntax and poor debugging feedback for the sqlite3 shell. I would like to eventually explore more advanced data analysis tools, software like Hadoop or Cassandra, so knowing SQL is going to be invaluable moving forward.

Comments