IERG4210 (Spring 2021)

Forms II - Server-side Implementation

Sherman Chow

Agenda

  • (Quick Recall) Request Methods: GET vs. POST
  • PHP, a server-side language:
    • Basics
    • String
    • Processing Arrays
  • Form/Request Handling with PHP:
    • Input - Sanitizations and Validations
    • Process - DB Manipulation
    • Output - HTML vs. JSON

PHP Language

HTTP Request Method: GET vs. POST

  • We covered in Lecture 4: Client-side Implementations of Forms
    • Input Controls -> Validations -> Form Submissions
  • No matter how grand the client-side is, a server will receive:
  • GET Request
    (Parameters are appended as query string at the URL)
        GET /index.php?catid=3 HTTP/1.1
        Host: www.shop.ierg4210.org
  • or POST Request
    (Parameters are encoded as the request body)
        POST /admin-process.php HTTP/1.1
        Host: secure.shop.ierg4210.org
        Content-Length: 37
        Content-Type: application/x-www-form-urlencoded
    
        name=Fresh%20Fruits&action=cat_insert
    (Note that there are 2 additional request headers)

Server-side Web Programming Languages

  • Which one is the most popular server-side language?
Language Usage
1. PHP 79.2%
2. ASP.NET 9.1%
3. Ruby 4.4%
4. Java 3.4%
5. Scala 1.8%

Ref: W3Techs.com, retrieved on Feb. 17, 2021

Architecture of Web Server + PHP

Typical Workflow of a PHP Request
  • Output from PHP engine will be sent back to the client directly
  • Use chmod 705 to allow read (4) and execution (1) for public
  • Test with e.g., phptester.net

PHP Basics (1/3)

  • PHP is a Server-side Scripting Language
    • Create a file that ends with .php, e.g., test.php
    • Insert PHP code anywhere, e.g., <?php echo date(); ?>
    • Content outside of <?php ... ?> tags will be kept as it is
    • Code inside <?php ... ?> tags will be executed and be replaced by its execution results (like output to stdout)
  • C-like syntax with a few syntactic differences:
    • All variables start with the $ sign, e.g., $data, $array
    • No need to declare a variable before use
    • Dynamic Typing Variables (e.g., $a = 1; $a = 'hello';)
  • Block-level Scoping for variables (like C but unlike JavaScript)
  • Please compare/contrast it with/from JavaScript

PHP Basics (2/3)

  • Code hidden from client-side; show only processed output
  • For example, given a hello.php with its content as follows:
            <h1><?php echo "Hello World"; ?></h1>
  • Only the following is visible to the browser when visiting hello.php:
            <h1>Hello World</h1>
  • Hence, dynamic HTML outputs can be mixed with static HTML
    (Feature vs. Vulnerability!)

PHP Basics (3/3)

  • Use PHP to sanitize user inputs to avoid Cross-Site Scripting attack
    (Not the first time you hear about XSS, but we will talk about it later)
    • DO NOT trust users' input, ever
      (why the following is dangerous?)
      <h1>Good morning, <?php echo $name; ?>.</h1>
    • Apply context-dependent output sanitizations instead:
      <h1>Good morning, <?php echo htmlspecialchars($name); ?>.</h1>
  • htmlspecialchars() escapes < to &lt; and > to &gt;, etc.
  • AVOID writing JavaScript with PHP
    • Recall I just ask you to compare PHP with JavaScript?
    • We lack a good santization function!

PHP String Processing

  • Difference between single-quoted ' and double-quoted " strings
    (and linebreak \n and <br/ >)
    PHP code Output
    echo "Hello\nWorld"; Hello World
    echo "Hello<br/ >World"; Hello
    World
    echo 'Hello\nWorld'; Hello\nWorld
  • String Concatenation - joined by a dot (vs. + in JavaScript)
    <ul><?php $name="Apple";
          echo "<li>" . $name . "</li>";?></ul>
  • Some Useful Functions
    <?php strlen("hello") == 5       // true
      strpos("hello", "l") == 2      // true
      $a = ''; empty(a)              // true
      print_r($array);
    ?> 

PHP Arrays (1/2)

  • Numeric Array (similar to JavaScript array [])
    $fruits = array("apple", "orange", "pineapple");
  • Associative Array (similar to JavaScript object {})
    $ages=array("Niki"=>6, "Jon"=>9, "Steve" => 40);
  • To add/edit an element (dynamic-sized)
       $fruits[] = "banana";// create a new element
       $fruits[1] = "o2";	// changed orange to o2
       $ages["Peter"] = 10;	// added a new element
       $ages["Niki"]++;	// passed her birthday
  • To remove an element
       unset($fruits[1]);	// o2 is *deleted*
       unset($ages["Steve"])// R.I.P. Steve...

PHP Arrays (2/2)

  • Looping over numeric array (e.g., via index $i)
        for ($i = 0, $len = count($fruits); $i < $len; $i++)
  • Looping over associative array
        foreach ($ages as $key => $val)
          /* do something with $key and $val */
  • array_push() and array_pop()
    - Using numeric array as a stack
  • implode() - Join array elements with a string
    explode() - Split a string by string
    (similar to String.split()/.join()in JavaScript)
  • array_map('callback_fx', array)
    - Applies callback_fx to the elements of the given arrays
  • sort() - Sort an array (pass it by reference)
  • array_diff() - Output different elements (what if no diff.?)

Reference: http://php.net/manual/en/control-structures.foreach.php

PHP Functions

  • Simple Example
    // Example Call: hello()
    function hello() { echo "Hello!"; }
  • Accepting Function Parameters
    // Example Call: hello('Niki')
    function hello($name) {
      echo "Hello, " . htmlspecialchars($name) . "!"; }
  • Similar to escapeHTML() in JS, htmlspecialchars() is to sanitize output
  • Specifying Default Function Parameters (must be right-aligned)
        // Example Call: hello('Niki') or hello('Niki', 'F')
        function hello($name, $sex = 'M') {}
        function hello2($name, $sex = 'M', $income = 10000){...}
        // Is the function call "hello2('Niki',1000)" legal?
        // Is the function call "hello2('Niki',1000)" "meaningful"?

Best Practice: To Include an External File

  • E.g., your assignment has a main page and a product description page
  • Some HTML are actually shared among both pages
  • Best Practice: Host the common part in a file and load it dynamically across multiple pages to facilitate code reuse
    • Without PHP execution
      <?php readfile('html/header.html'); ?>
      <h1>Product Description:</h1>
      <!-- Description goes here -->
      <?php readfile('html/footer.html'); 
      ?>
    • With PHP execution - good for including PHP libraries
      <?php include_once('lib/myLib.php'); ?>
  • readfile() is faster than include_once() as no parsing is needed to look for PHP, e.g., see here
  • There are also require() and require_once()

Form Handling on Server Side

Form/Request Handling with PHP

  • Given an example of HTTP request:
    POST /admin-process.php?action=cat_insert HTTP/1.1
    Host: secure.shop.ierg4210.org
    Content-Length: 19
    Content-Type: application/x-www-form-urlencoded
    
    name=Fresh%20Fruits
  • Input parameters are stored in some superglobals arrays:
        $_POST['name'] == 'Fresh Fruits'   // true; 
        // Values are auto-urldecoded, '%20' -> ' '
        $_GET['action'] == 'cat_insert'    // true
        $_REQUEST['action'] == 'cat_insert'// true
  • $_REQUEST combines $_GET, $_POST, and $_COOKIE (default order)

Design Pattern: Validate before further processing

    <?php
    if ($_REQUEST['action'] == 'cat_insert') {
      // See next slide for details
      inputValidate($_POST['name'], '/^[\w\- ]+$');
      // DB Manipulation with SQL
      DB_insertCategory($_POST['name']);
    }
    ?>

Input - Validation Flaws

  • Severity of the problem
    • Ranked High in 2007, 2010, 2013, and 2017 by OWASP Top 10 Application Security Risks
    • e.g., in 2013 and 15, input validation flaws are ranked:
      A1 Injection, A7 Cross-site Scripting, A5 Broken Access Control
  • Root cause: "Unexpected" inputs could lead to unauthorized actions
    • Blurry boundary between user data and code
  • Fundamental Defences: Restrict users' inputs
    • Input Validations - rejecting invalid inputs
      • most effective - whitelisting acceptable data
      • may be insecure - blacklisting malicious characters
        (hard to exhaust: can you blacklist unknown exploit?)
    • Input Sanitizations - transforming invalid inputs to be safe
      • Type casting: JS: parseInt('666'); PHP: $a = (int)$a;
      • Escape characters (context-dependent):
        e.g., prevent SQL injection (later lectures)

Input - Server-side and Client-side Validations

  • Code at client side (for user experience enhancement)
    • is shipped to the client
    • can be freely manipulated inside browser
  • Code at server-side (for security enforcement)
    • is (supposed to be) hidden from clients
    • will send only the resulted HTML
    • thus cannot be easily bypassed
  • Security Best Practice (for input validation):
    • the server side should be at least stricter than the client side
    • they should be as consistent as possible
<?php 
// Using the same regular expression as done in JavaScript
if (preg_match('/^[\w\-\/][\w\-\/\.]*@[\w\-]+(\.[\w\-]+)*(\.[\w]{2,6})$/', 
    $_POST['email'])) {
    /* Only validated inputs can go for further processing */
    } else {
      /* reject the input */
      exit();
    } ?>

Process - Database Management

  • SQL Languages (e.g., SELECT *) to be covered in Tutorial
  • DB Manipulations with PHP Data Objects (PDO)
    function ierg4210_cat_fetchall() {
      // DB manipulation
      global $db;
      $db = ierg4210_DB();
      $q =$db->prepare("SELECT * FROM categories LIMIT 100;");
      if ($q->execute())
        return $q->fetchAll();  // i.e., an array of categories
    }
    function ierg4210_cat_insert() {
      // input validation or sanitization
      if (!preg_match('/^[\w\-, ]+$/', $_POST['name']))
        throw new Exception("invalid-name");
      // DB manipulation
      global $db;
      $db = ierg4210_DB();
      $q = $db->prepare("INSERT INTO categories (name) VALUES (?)");
      return $q->execute(array($_POST['name']));
       // will return True/False - whether it is success
    }
  • "Prepared statement" is to prevent SQL injections (details later)

Process - Design Pattern of Form Handlers

  • Maintain a Single Entrance for Form Handlers
  • HTML: All forms send HTTP requests to admin.php, and associate an unique action name as hidden parameter with each form
  • PHP: In the centralized entrance admin.php, routes HTTP requests to a corresponding function based on action name
  • E.g., a simplified version of admin.php
function ierg4210_cat_fetchall() {
    /* return an array of categories */
}

function ierg4210_cat_insert() {
    /* return true or false to indicate success */
}

Process - Design Pattern of Form Handlers (Cont.)

if (!empty($_REQUEST['action'])) {
  header('Content-Type: application/json');
      // JSON to be discussed in next slide
  try {
    // call corresponding function based on action name
    $targetFunction = 'ierg4210_' . $_REQUEST['action']
    $returnVal = call_user_func($targetFunction)
    if ($returnVal === false)
      echo json_encode(array('failed'=>true));
    else echo 'while(1);'.json_encode(
            array('success' => $returnVal));
  } catch(Exception $e) {
    echo 'while(1);'.json_encode(
            array('failed' => $e->getMessage()));
  }
} else echo json_encode(array('failed'=>'undefined'));
Pay attention to while(1) ; and JSON

Output - HTML vs. JSON (1/4)

  • Traditionally, HTML output is returned after processing
    <?php 
    readfile('html/header.html');
    
    for ($categories=ierg4210_cat_fetchall(), $i=0, $cat;
            $cat = $categories[$i]; $i++) {
      /* Re-populate the HTML with $cat['catid'] and $cat['name'] */
    }
    if (ierg4210_cat_insert())
        echo '<h2>The category is created successfully.</h2>';
    
    /* Reproduce other HTML snippets here, e.g., forms 
    */
    readfile('html/footer.html');
    ?>
  • After users submit the forms via an HTML page, a browser has to re-download the "same" HTML page even with a single tiny difference.
  • In my UG years, eXtensible markup language (XML) is "trendy."
    • As bulky as HTML
    • Slower than JSON parser
    • Used in legacy web services supporting SOAP

Output - HTML vs. JSON (2/4)

  • Nowadays, we use JavaScript Object Notation (JSON) format
  • Compact in response size. Fast JSON parser.
  • Facilitate shifting data binding & user-interface (UI) work to client-side
  • E.g., encode the output of ierg4210_cat_fetch_all() will give:
    <?php
    function ierg4210_cat_fetchall() {
        /* return an array of categories */
    }
    function ierg4210_cat_insert() {
        /* return true or false to indicate success */
    }
    header('Content-Type: application/json');
    if (($returnVal=
        call_user_func('ierg4210_'.$_REQUEST['action']))===false)
      echo json_encode(array('success' => $returnVal));     ?>
    {"success":[{"catid":"1","name":"Fruits"},
                {"catid":"2","name":"Candies"}]}

Output - HTML vs. JSON (3/4)

  • JSON.parse() in JavaScript decodes the JSON output at client-side:
    <script type="text/javascript">
    myLib.ajax({url:'admin-process.php?action=cat_fetchall',
                success:function(output){
      // to decode the returned data into an object
      var json = JSON.parse(output);
      if (json.success) {
        // output each record with proper output sanitizations
        for (var i = 0, record; record = json.success[i]; i++) {
          somewhere.innerHTML +=
                'CatId: ' + parseInt(record.catid)
                + '<br/>' + 'Name: ' + record.name.escapeHTML();
        }
      } else alert('Error!');
    }});
    </script>

Output - HTML vs. JSON (4/4)

  • Advantages of using JSON when compared to HTML
    1. Minimize bandwidth needed (since no redundant download)
    2. JSON parsing is stunning fast as the format itself is JS (native)!
    3. Loose coupling: PHP - data-intensive processing; JS - UI handling

Ref: http://www.json.org

PHP code debugging