IERG4210 (Spring 2021)

Web & Database Servers

Sherman Chow

Agenda

"Modern"/"Non-traditional" Web Servers
Web Servers on the Cloud
- Quick Intro. to Cloud
- Architecture and Designs
Database (DB) (Storage) Servers
- Quick Intro. to DB Storage
  - Database vs. Cache
- Relational Database (MySQL, SQLite)
- NoSQL
- Quick Intro. to In-memory Cache (Redis)
DB Integrity and Constraints
Basic SQL Statements

Traditional vs. Newer-Generation Web Servers

	Apache	IIS	Nginx	Node
Concurrency Model	threaded/process-oriented approach (inefficient memory use and scheduling)		asynchronous event-driven approach (no blocking, more scalable)
Common Programming Language	PHP	ASP.NET/PHP	None/PHP	JavaScript
Design Goals	full-featured generic purpose		less features, serve specific purpose (e.g., cache/proxy)	specific purpose app framework bundled w/ web server
Open-source	yes	proprietary	yes	yes

Web Server Survey (a go-to website for major news for web servers)

Fast and Scalable WebApp using Node.js

HTTP is a 1st-class citizen: designed w/ streaming & low latency in mind
Event-driven architecture and non-blocking I/O API from ground-up
- Model best for slow/blocking (network) I/Os, now asynchronously handled
- Concepts and benefits covered in past lectures and reading
- Resource-efficient: Can easily scale up with Amazon Beanstalk
- Unlike optional async from a library (e.g., Twisted in Python)
- A "little harder" to program due to its asynchronous architecture
Code reuse due to single language across both client- and server-side
- Developer-friendly to JS/AJAX folks
Fast v8 JavaScript Engine (as in Chrome)
- JIT Compilation: Compiled to binary and runs like executable
- More memory-efficient and Faster than vanilla PHP
  (cf., HipHop VM: open-source JIT-based VM for Hack language)
Still Wide Adoptions (was introduced in 2009):
- Linkedin, Netflix, Uber, Trello, PayPal, NASA, eBay, Medium

Cloud and Scalable Computing

Why Cloud?

Uneven Utilizations by nature
- Day of the week and hours of the day: Web surfing
- Season of the year: Christmas e-gift cards
- Adhoc usage: One-off computation jobs/testing

Reference: Amazon EC2 Auto Scaling User Guide: Amazon EC2 Auto Scaling benefits

Cloud Types - Cloud Service Models

Infrastructure-as-a-Service (IaaS)
Platform-as-a-Service (PaaS)
Software-as-a-Service (SaaS)

Reference: Cloud computing (Wikipedia)

Cloud Types (cont.)

What service model the following offer (SaaS, PaaS, IaaS)?
- Salesforce.com
- Amazon Web Services (AWS) Elastic Beanstalk
- AWS Elastic Compute Cloud (EC2)
What other X-as-a-Service have you heard of?
Ownership Models
- Public Cloud
- Private Cloud
- Hybrid Cloud
- Community Cloud

Architecture of AWS Elastic Beanstalk

Ref: https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/concepts-webserver.html

Scaling Up vs. Scaling Out

Scale-up (vertically): more resources for a single node
- More expensive for supercomputer (less cost-effective)
Scale-out (horizontally): more nodes
- A farm of cheaper instances well-networked (high throughput)
- Application distributable in individual instances

What is Amazon EC2 Auto-Scaling?

Design Considerations

Scalability and Elasticity
- Auto-scaling allows on-demand instances/node creation/removal
- Possibly based on metrics: CPU, memory, disk I/O, network I/O...
Fault Tolerance for Availability
- Automated recovery for EC2 instances when some of them die
- Automated backups for storage
Software Deployability
- Easy to deploy; and create a new environment to test out changes
- Integrated with GIT for systematic versioning control

DB Storage Servers

Two Paradigms of DB Storage

Relational Databases
- Structured in tables: Slow but powerful
- Hard to scale
- Accessible through the Structured Query Language (SQL)
- Often used as persistence storage
- Examples: MySQL (free), MSSQL, Oracle, SQLite (free), etc.
NoSQL Databases: "non-SQL" or "non-relational"
- Existed since the late 1960s
- The name "NoSQL" was only coined in the early 21st century
- Unstructured as a tradeoff for speed
- Easy to scale-out
- Accessible through API
- Optimized for speed, thus often as an in-memory cache
- For applications that mostly query (vs. update) in async. manner
- Inaccuracy can be tolerated by the application (see later slides)
- Examples: Redis (free), Memcached (free), MongoDB, etc.

SQLite

Public domain license (i.e., FREE!)
Lightweight in design
- Lightweight: multiple processes can read at the same time; however, only one process can make changes at any moment
- Best for single-user apps (MobileApps/Simple WebApps)
Supported by multi-platforms (e.g., Windows, Linux)
- Pre-installed in AWS EC2
Stores everything in a single file
- Easy to embed, test, backup, and transfer
Simple access-right management
- No user account management as in full-blown DBs like MySQL
- Simply depending/relying on the file access rights

MySQL

Dual-licensing: GPL/FLOSS (if you don't need support), or proprietary
Relational DB
- Table structure
Full-featured, accessible using SQL
- But heavyweight, quite slow
- Relatively powerful considering it is something for free
Supported by multi-platforms
- Windows, Linux
- Pre-installed in AWS EC2

Redis (Remote Dictionary Server)

Open-source NoSQL DB
- Cache: In-memory DB as a key-value store (hence, very fast)
- Supports a rather rich set of other data structures
  - strings, lists, sorted sets, bitmaps, geospatial indexes
Common use cases:
- To serve queries, i.e., GET requests
  - e.g., cache your templates to prevent re-rendering
  - When to expire? Expire on DB update?
- To completely serve as a DB
  - Data loss when the machine is down (it's in-memory)
  - Periodically backup data to persistent storage
  - Best of both worlds of MongoDB, Memcached
See Redis.io, Redis NPM, and AWS ElastiCache for details

Redis's Security Support

Basically, no support :(
Designed to be accessed only by trusted clients
Support some simple authentication
Can be restricted to certain interfaces
No data encryption (slow for a very large DB)

NoSQL Database

Big Data --> Scalability
- Distributed processing
- Live with the CAP Theorem (see the next slide)
Data Analysis vs. Data Modification
- Few updates on data
- Mostly query
- Need to get results quickly

CAP Theorem

Core Requirements of Distributed Systems
Trilemma, you can only choose two (and relax the remaining)
- C&A: Traditional Relational Database Management System
- C&P: Redis
- A&P: CouchDB

SQL or NoSQL?

Traditional databases (C&A) focus on ACID
- Atomicity, Consistency, Isolation, Durability
NoSQL (A&P): BASE (no, we are not teaching chemistry)
- Basic Availability, Soft state, Eventual consistency
  - Weak consistency ??? Stale data OK
  - Availability first -- Basically Available
  - Best effort
  - Approximate answers OK
  - Aggressive (optimistic)
  - Simpler and faster

Different Data Models for NoSQL (1/2)

Key-Value Store
- Collection of key/value pairs
- E.g., Redis
Column-family System (multidimensional sorted map)
- a key-value store but further structures the value into families
- Very light "schema": (rowKey, colKey, timestamp) -> value
  - e.g., url as row keys, and "component" of a web as the value
  - Diff. aspects of the page as columns, e.g., contents, anchor
- E.g., BigTable, Hadoop HBase
Document-based
- Documents with tags, metadata, or a certain structure
  - e.g., .pdf, .xml, etc.
- E.g., MongoDB, Apache CouchDB
Graph-based
- E.g., Neo4J

Different Data Models for NoSQL (2/2)

Database Integrity and Constraints

Issues on Database Integrity

Various concepts of Integrity
- Entity Integrity: every record (row) in a table is unique
- Referential Integrity: data are consistent across multiple tables
- Column Integrity: data of the same column have the same "type"
  - What defines a data type? --> range, precision, and supported operations
  - Provide DEFAULT value
  - Or use CHECK(logicExpr)
- Other User-Defined Integrity: any special requests over the data
Two Styles
- Static: define some static constraints when creating the table
- Dynamic: define some logic conditions or code that would be executed to perform the integrity check

Using Constraints to Achieve Integrity Goals (e.g. for Tables)

NOT NULL: Value must be defined
- Note the differences between NULL and 0 or False values
UNIQUE
- Applied to a single column
- One and only one record could have NULL value (which is different from Primary Key)
Primary Key: unique + not-null
- Also, only one primary key is allowed
- But could have multiple columns with unique-and-not-null constraints
- Could be a combination of multiple columns (as long as the combination is unique and with no NULL sub-column)
Foreign Key: to prevent illegal data
- Could not delete primary key if it is used as a foreign key
- Could not insert a record with a foreign key that has no corresponding primary key there

Basic SQL Statements

Connect to the DB using MySQL Command Line Interface (CLI)

1. connecting to server via SSH
2. install software: For Redhat: sudo yum install mysql -y
   for Debian/Ubuntu: $ sudo apt-get install mysql-server mysql-client
3. start the mysql server: $ mysql -u root -p -h <your-db>.rds.amazonaws.com

May need to enter your configured Master Password
It fails when connecting directly to MySQL from local. Why?
- The EC2 instances and MySQL are in the same Security Group, hence in the same Virtual Private Cloud (network)
- From Security Group settings, expect only port 80 is opened
  - SSH is hosted at port 22 and is dynamically made accessible by eb ssh
  - MySQL is hosted at 3306 and thus cannot be accessed by public
- Hence, access the DB thru EC2

MySQL: Create a User and DB

Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 26
Server version: 5.5.40-log Source distribution

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE USER 'shopXX-admin' IDENTIFIED BY 'mypass';
mysql> CREATE DATABASE shopXX;
mysql> SHOW DATABASES;
mysql> GRANT ALL ON shopXX.* TO 'shopXX-admin';
mysql> exit;
- There are some tables (mysql, schema) that you do not want to mess up with

MySQL: Login using New User and Pick new DB

Re-login using the newly created user
- $ mysql -u shopXX-admin -p -h [YourHostName]
- The hostname may be like: <your-db>.rds.amazonaws.com
- You can replace it with a local server
Pick the newly created DB
- mysql> USE shopXX;
Security considerations:
- shopXX-admin is granted full access to DB called shopXX
- Practise the least-privilege approach (e.g., SELECT)
- To drop/delete the user:
```
mysql> DROP USER shopXX-admin
```

Using shopXX-admin@localhost will fail because we'd support remote logins from the EC2

Create a Table (1/2)

- Creating a Table ``categories''

mysql> CREATE TABLE categories ( 
  catid INTEGER PRIMARY 
  KEY AUTO_INCREMENT, name VARCHAR(512) NOT NULL
) ENGINE=INNODB;

Primary key is unique and auto-increment by default
(i.e., incremented by 1 automatically for every new record)
Ref: MySQL: Creating InnoDB Tables
To check/recall what you have created
```
mysql> DESCRIBE 
categories;
```
To drop/delete the whole table and data

mysql> DROP TABLE categories;
mysql> TRUNCATE categories; /* To drop/delete all the data*/

Create a Table (2/2)

Creating a Table ``Product''
(simplified and redacted, add price/description type yourself)

mysql> CREATE TABLE products (
pid INTEGER PRIMARY KEY AUTO_INCREMENT,
catid INTEGER,
name VARCHAR(512),
price ______________,
description _____________,
FOREIGN KEY(catid)
REFERENCES categories(catid)
) ENGINE=INNODB;

Create an index for catid to make subsequent queries by it faster
```
mysql> CREATE INDEX i1 ON products(catid);
```

Question: What is INDEX?

Reference: Data types supported by MySQL

INSERT (for adding records to a table)

- Note: put null for the primary key to let it auto-increment

mysql> INSERT INTO categories VALUES (null, "Fruits");

- To insert 2 records into products

mysql> INSERT INTO products
VALUES (null, 1, "Apple", "1.5");
mysql> INSERT INTO products (catid, name, price)
VALUES (1, "Banana", "1.5");

- Try to insert a product to an inexistent category:

mysql> INSERT INTO products (catid, name, price)
VALUES (2, "Help", "999");

- Error: constraint failed
- Note: This error is expected given that the foreign key setting

More on SQL INSERT: http://dev.mysql.com/doc/en/insert.html

SELECT (for looking up records)

- To select all "fruits" in products (given fruits is of catid=1) 
  mysql> SELECT * 
  FROM products WHERE 
  catid = 1;

- To select only the name and price columns 
  mysql> SELECT name, price FROM products WHERE catid = 1;

- To select only 5 "fruits" in products
  mysql> SELECT * FROM products WHERE catid = 1 LIMIT 5;

- To select the 11-20th most expensive "fruit" products 
  mysql> SELECT * FROM products
  WHERE catid = 1
  ORDER 
  BY price DESC 
  LIMIT 11, 10;

- Recall: For those frequently-queried columns, create INDEX (trading off space for speed)

More on SQL SELECT: http://dev.mysql.com/doc/en/select.html

UPDATE

- Setting a static value
  mysql> UPDATE categories 
  SET name = "Fresh Fruits" 
  WHERE catid = 1;

-Setting an expression (e.g., 10% increase in price) 
  mysql> UPDATE products 
  SET price = price * 1.1 
  WHERE pid = 2;

The WHERE condition is the same as that of SELECT
So, when you are not sure about what records are affected
- SELECT the records first, then replace it with UPDATE
- Otherwise, you can kill all your data unintentionally
More on SQL UPDATE: http://dev.mysql.com/doc/en/update.html

DELETE

The following code will raise an error. Why? (hints: foreign key)
```
mysql> DELETE FROM categories WHERE catid = 1;
```

Deleting a product

mysql> DELETE FROM products WHERE pid = 2;

The WHERE conditions are again the same as that of SELECT
More on SQL DELETE: http://dev.mysql.com/doc/en/delete.html
Deleting records requires extra attention!
- Backup your database
- Or SELECT what rows are affected before performing DELETE

Database Abstraction Layer

A universal interface for accessing different databases
Coding Consistency: Regardless of the DB, use the same set of code
Single Interface: Easy to switch database without code modifications
E.g., for PHP, or any-db for Node.js