IERG4210 (Spring 2021)

Web & Database Servers

Sherman Chow

Agenda

  • "Modern"/"Non-traditional" Web Servers
  • Web Servers on the Cloud
    • Quick Intro. to Cloud
    • Architecture and Designs
  • Database (DB) (Storage) Servers
    • Quick Intro. to DB Storage
      • Database vs. Cache
    • Relational Database (MySQL, SQLite)
    • NoSQL
    • Quick Intro. to In-memory Cache (Redis)
  • DB Integrity and Constraints
  • Basic SQL Statements

Traditional vs. Newer-Generation Web Servers

Apache IIS Nginx Node
Concurrency Model threaded/process-oriented approach
(inefficient memory use and scheduling)
asynchronous event-driven approach
(no blocking, more scalable)
Common Programming Language PHP ASP.NET/PHP None/PHP JavaScript
Design Goals full-featured
generic purpose
less features,
serve specific purpose (e.g., cache/proxy)
specific purpose
app framework bundled w/ web server
Open-source yes proprietary yes yes

Web Server Survey (a go-to website for major news for web servers)

Fast and Scalable WebApp using Node.js

  • HTTP is a 1st-class citizen: designed w/ streaming & low latency in mind
  • Event-driven architecture and non-blocking I/O API from ground-up
    • Model best for slow/blocking (network) I/Os, now asynchronously handled
    • Concepts and benefits covered in past lectures and reading
    • Resource-efficient: Can easily scale up with Amazon Beanstalk
    • Unlike optional async from a library (e.g., Twisted in Python)
    • A "little harder" to program due to its asynchronous architecture
  • Code reuse due to single language across both client- and server-side
    • Developer-friendly to JS/AJAX folks
  • Fast v8 JavaScript Engine (as in Chrome)
    • JIT Compilation: Compiled to binary and runs like executable
    • More memory-efficient and Faster than vanilla PHP
      (cf., HipHop VM: open-source JIT-based VM for Hack language)
  • Still Wide Adoptions (was introduced in 2009):
    • Linkedin, Netflix, Uber, Trello, PayPal, NASA, eBay, Medium

Cloud and Scalable Computing

Why Cloud?

  • Uneven Utilizations by nature
    • Day of the week and hours of the day: Web surfing
    • Season of the year: Christmas e-gift cards
    • Adhoc usage: One-off computation jobs/testing

Reference: Amazon EC2 Auto Scaling User Guide: Amazon EC2 Auto Scaling benefits

Cloud Benefits: on-demand + sharing

Cloud Types - Cloud Service Models

  • Infrastructure-as-a-Service (IaaS)
  • Platform-as-a-Service (PaaS)
  • Software-as-a-Service (SaaS)

Reference: Cloud computing (Wikipedia)

Cloud Types (cont.)

  • What service model the following offer (SaaS, PaaS, IaaS)?
    • Salesforce.com
    • Amazon Web Services (AWS) Elastic Beanstalk
    • AWS Elastic Compute Cloud (EC2)
  • What other X-as-a-Service have you heard of?
  • Ownership Models
    • Public Cloud
    • Private Cloud
    • Hybrid Cloud
    • Community Cloud

Architecture of AWS Elastic Beanstalk

DNS then ELB

Ref: https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/concepts-webserver.html

Scaling Up vs. Scaling Out

  • Scale-up (vertically): more resources for a single node
    • More expensive for supercomputer (less cost-effective)
  • Scale-out (horizontally): more nodes
    • A farm of cheaper instances well-networked (high throughput)
    • Application distributable in individual instances

What is Amazon EC2 Auto-Scaling?

Design Considerations

  • Scalability and Elasticity
    • Auto-scaling allows on-demand instances/node creation/removal
    • Possibly based on metrics: CPU, memory, disk I/O, network I/O...
  • Fault Tolerance for Availability
    • Automated recovery for EC2 instances when some of them die
    • Automated backups for storage
  • Software Deployability
    • Easy to deploy; and create a new environment to test out changes
    • Integrated with GIT for systematic versioning control

DB Storage Servers

Two Paradigms of DB Storage

  • Relational Databases
    • Structured in tables: Slow but powerful
    • Hard to scale
    • Accessible through the Structured Query Language (SQL)
    • Often used as persistence storage
    • Examples: MySQL (free), MSSQL, Oracle, SQLite (free), etc.
  • NoSQL Databases: "non-SQL" or "non-relational"
    • Existed since the late 1960s
    • The name "NoSQL" was only coined in the early 21st century
    • Unstructured as a tradeoff for speed
    • Easy to scale-out
    • Accessible through API
    • Optimized for speed, thus often as an in-memory cache
    • For applications that mostly query (vs. update) in async. manner
    • Inaccuracy can be tolerated by the application (see later slides)
    • Examples: Redis (free), Memcached (free), MongoDB, etc.

SQLite

  • Public domain license (i.e., FREE!)
  • Lightweight in design
    • Lightweight: multiple processes can read at the same time; however, only one process can make changes at any moment
    • Best for single-user apps (MobileApps/Simple WebApps)
  • Supported by multi-platforms (e.g., Windows, Linux)
    • Pre-installed in AWS EC2
  • Stores everything in a single file
    • Easy to embed, test, backup, and transfer
  • Simple access-right management
    • No user account management as in full-blown DBs like MySQL
    • Simply depending/relying on the file access rights

MySQL

  • Dual-licensing: GPL/FLOSS (if you don't need support), or proprietary
  • Relational DB
    • Table structure
  • Full-featured, accessible using SQL
    • But heavyweight, quite slow
    • Relatively powerful considering it is something for free
  • Supported by multi-platforms
    • Windows, Linux
    • Pre-installed in AWS EC2

Redis (Remote Dictionary Server) redis logo

  • Open-source NoSQL DB
    • Cache: In-memory DB as a key-value store (hence, very fast)
    • Supports a rather rich set of other data structures
      • strings, lists, sorted sets, bitmaps, geospatial indexes
  • Common use cases:
    • To serve queries, i.e., GET requests
      • e.g., cache your templates to prevent re-rendering
      • When to expire? Expire on DB update?
    • To completely serve as a DB
      • Data loss when the machine is down (it's in-memory)
      • Periodically backup data to persistent storage
      • Best of both worlds of MongoDB, Memcached
  • See Redis.io, Redis NPM, and AWS ElastiCache for details

Redis's Security Support

  • Basically, no support :(
  • Designed to be accessed only by trusted clients
  • Support some simple authentication
  • Can be restricted to certain interfaces
  • No data encryption (slow for a very large DB)

NoSQL Database

  • Big Data --> Scalability
    • Distributed processing
    • Live with the CAP Theorem (see the next slide)
  • Data Analysis vs. Data Modification
    • Few updates on data
    • Mostly query
    • Need to get results quickly

CAP Theorem

  • Core Requirements of Distributed Systems
  • Trilemma, you can only choose two (and relax the remaining)
    • C&A: Traditional Relational Database Management System
    • C&P: Redis
    • A&P: CouchDB
CAP 
Venn Diagram

SQL or NoSQL?

  • Traditional databases (C&A) focus on ACID
    • Atomicity, Consistency, Isolation, Durability
  • NoSQL (A&P): BASE (no, we are not teaching chemistry)
    • Basic Availability, Soft state, Eventual consistency
      • Weak consistency ??? Stale data OK
      • Availability first -- Basically Available
      • Best effort
      • Approximate answers OK
      • Aggressive (optimistic)
      • Simpler and faster

Different Data Models for NoSQL (1/2)

  • Key-Value Store
    • Collection of key/value pairs
    • E.g., Redis
  • Column-family System (multidimensional sorted map)
    • a key-value store but further structures the value into families
    • Very light "schema": (rowKey, colKey, timestamp) -> value
      • e.g., url as row keys, and "component" of a web as the value
      • Diff. aspects of the page as columns, e.g., contents, anchor
    • E.g., BigTable, Hadoop HBase
  • Document-based
    • Documents with tags, metadata, or a certain structure
      • e.g., .pdf, .xml, etc.
    • E.g., MongoDB, Apache CouchDB
  • Graph-based

Different Data Models for NoSQL (2/2)

Database Integrity and Constraints

Issues on Database Integrity

  • Various concepts of Integrity
    • Entity Integrity: every record (row) in a table is unique
    • Referential Integrity: data are consistent across multiple tables
    • Column Integrity: data of the same column have the same "type"
      • What defines a data type? --> range, precision, and supported operations
      • Provide DEFAULT value
      • Or use CHECK(logicExpr)
    • Other User-Defined Integrity: any special requests over the data
  • Two Styles
    • Static: define some static constraints when creating the table
    • Dynamic: define some logic conditions or code that would be executed to perform the integrity check

Using Constraints to Achieve Integrity Goals (e.g. for Tables)

  • NOT NULL: Value must be defined
    • Note the differences between NULL and 0 or False values
  • UNIQUE
    • Applied to a single column
    • One and only one record could have NULL value (which is different from Primary Key)
  • Primary Key: unique + not-null
    • Also, only one primary key is allowed
    • But could have multiple columns with unique-and-not-null constraints
    • Could be a combination of multiple columns (as long as the combination is unique and with no NULL sub-column)
  • Foreign Key: to prevent illegal data
    • Could not delete primary key if it is used as a foreign key
    • Could not insert a record with a foreign key that has no corresponding primary key there

Basic SQL Statements

Connect to the DB using MySQL Command Line Interface (CLI)

1. connecting to server via SSH
2. install software: For Redhat: sudo yum install mysql -y
   for Debian/Ubuntu: $ sudo apt-get install mysql-server mysql-client
3. start the mysql server: $ mysql -u root -p -h <your-db>.rds.amazonaws.com
  • May need to enter your configured Master Password
  • It fails when connecting directly to MySQL from local. Why?
    • The EC2 instances and MySQL are in the same Security Group, hence in the same Virtual Private Cloud (network)
    • From Security Group settings, expect only port 80 is opened
      • SSH is hosted at port 22 and is dynamically made accessible by eb ssh
      • MySQL is hosted at 3306 and thus cannot be accessed by public
    • Hence, access the DB thru EC2

MySQL: Create a User and DB

Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 26
Server version: 5.5.40-log Source distribution

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> CREATE USER 'shopXX-admin' IDENTIFIED BY 'mypass';
mysql> CREATE DATABASE shopXX;
mysql> SHOW DATABASES;
mysql> GRANT ALL ON shopXX.* TO 'shopXX-admin';
mysql> exit;
- There are some tables (mysql, schema) that you do not want to mess up with 

MySQL: Login using New User and Pick new DB

  • Re-login using the newly created user
    • $ mysql -u shopXX-admin -p -h [YourHostName]
    • The hostname may be like: <your-db>.rds.amazonaws.com
    • You can replace it with a local server
  • Pick the newly created DB
    • mysql> USE shopXX;
  • Security considerations:
    • shopXX-admin is granted full access to DB called shopXX
    • Practise the least-privilege approach (e.g., SELECT)
    • To drop/delete the user:
      mysql> DROP USER shopXX-admin
  • Using shopXX-admin@localhost will fail because we'd support remote logins from the EC2

Create a Table (1/2)

- Creating a Table ``categories''
mysql> CREATE TABLE categories ( 
  catid INTEGER PRIMARY 
  KEY AUTO_INCREMENT, name VARCHAR(512) NOT NULL
) ENGINE=INNODB;
  • Primary key is unique and auto-increment by default
    (i.e., incremented by 1 automatically for every new record)
  • Ref: MySQL: Creating InnoDB Tables
  • To check/recall what you have created
    mysql> DESCRIBE 
    categories;
  • To drop/delete the whole table and data
mysql> DROP TABLE categories;
mysql> TRUNCATE categories; /* To drop/delete all the data*/

Create a Table (2/2)

  • Creating a Table ``Product''
    (simplified and redacted, add price/description type yourself)
    mysql> CREATE TABLE products (
    pid INTEGER PRIMARY KEY AUTO_INCREMENT,
    catid INTEGER,
    name VARCHAR(512),
    price ______________,
    description _____________,
    FOREIGN KEY(catid)
    REFERENCES categories(catid)
    ) ENGINE=INNODB;
  • Create an index for catid to make subsequent queries by it faster
    mysql> CREATE INDEX i1 ON products(catid);
  • Question: What is INDEX?

  • Reference: Data types supported by MySQL

    INSERT (for adding records to a table)

    - Note: put null for the primary key to let it auto-increment 
    mysql> INSERT INTO categories VALUES (null, "Fruits");
    - To insert 2 records into products
    mysql> INSERT INTO products
    VALUES (null, 1, "Apple", "1.5");
    mysql> INSERT INTO products (catid, name, price)
    VALUES (1, "Banana", "1.5");
    - Try to insert a product to an inexistent category:
    mysql> INSERT INTO products (catid, name, price)
    VALUES (2, "Help", "999");
    - Error: constraint failed
    - Note: This error is expected given that the foreign key setting
    More on SQL INSERT: http://dev.mysql.com/doc/en/insert.html

    SELECT (for looking up records)

    - To select all "fruits" in products (given fruits is of catid=1) 
      mysql> SELECT * 
      FROM products WHERE 
      catid = 1;
    - To select only the name and price columns 
      mysql> SELECT name, price FROM products WHERE catid = 1;
    - To select only 5 "fruits" in products
      mysql> SELECT * FROM products WHERE catid = 1 LIMIT 5;
    - To select the 11-20th most expensive "fruit" products 
      mysql> SELECT * FROM products
      WHERE catid = 1
      ORDER 
      BY price DESC 
      LIMIT 11, 10;
    
    - Recall: For those frequently-queried columns, create INDEX (trading off space for speed) 
    More on SQL SELECT: http://dev.mysql.com/doc/en/select.html

    UPDATE

    - Setting a static value
      mysql> UPDATE categories 
      SET name = "Fresh Fruits" 
      WHERE catid = 1;
    -Setting an expression (e.g., 10% increase in price) 
      mysql> UPDATE products 
      SET price = price * 1.1 
      WHERE pid = 2;
    • The WHERE condition is the same as that of SELECT
    • So, when you are not sure about what records are affected
      • SELECT the records first, then replace it with UPDATE
      • Otherwise, you can kill all your data unintentionally
    • More on SQL UPDATE: http://dev.mysql.com/doc/en/update.html

    DELETE

    • The following code will raise an error. Why? (hints: foreign key)
      mysql> DELETE FROM categories WHERE catid = 1;
    • Deleting a product
      mysql> DELETE FROM products WHERE pid = 2;
    • The WHERE conditions are again the same as that of SELECT
    • More on SQL DELETE: http://dev.mysql.com/doc/en/delete.html
    • Deleting records requires extra attention!
      • Backup your database
      • Or SELECT what rows are affected before performing DELETE

    Database Abstraction Layer

    • A universal interface for accessing different databases
    • Coding Consistency: Regardless of the DB, use the same set of code
    • Single Interface: Easy to switch database without code modifications
    • E.g., for PHP, or any-db for Node.js

    Image Ref: W. Jason Gilmore, Beginning PHP and MySQL: From Novice to Professional, p.794, 2007