class notes

Crash Course GMU

A compilation of all the notes from all my CS classes at GMU – the ultimate revision guide.

Why did I take the time to compile this? By my final year of college, everything felt like a blur, and it seemed like I had learned nothing. Of course, after taking so many classes, I had definitely gained knowledge. However, I wanted to do one grand revision of everything I had learned, from the beginning to the very end.

My notes are neither the most organized nor fully comprehensive and should not be used as a cram study guide for final exams (you will undoubtedly fail). They are meant to offer a somewhat detailed summary of what you might learn over four years, along with random facts I found interesting. You’ll notice the quality of notes declines as the semester progresses, with a steep drop after midterm season. Alas, such is the life of a student.

Happy studying!

All Classes

Links to jump to any class you need help with:

Undergraduate Computer Science Coursework
CS 110 Essentials of Computer Science
CS 211 Object-Oriented Programming
STAT 344 Probability and Statistics
CS 262 Introduction to Low Level Programming
CS 310 Data Structures
CS 330 Formal Methods and Models
CS 484 Data Mining
CS 478 Natural Language Processing
CS 485 Autonomous Robotics
CS 367 Computer Systems Programming
CS 471 Operating Systems
CS 475 Concurrent and Distributed Systems
CS 483 Analysis of Algorithms
CS 480 Intro to Artificial Intelligence
CS 455 Networking

CS 110 - Essentials of Computer Science

The history of computer science spans several generations, with different tools and engineers marking each period:

  • Vacuum Tubes – Systems programmers wrote programs to convert system languages to machine languages.
  • Transistors – Application programmers wrote instructions for computers to execute operations.
  • Circuit Boards & Operating Systems – Non-programmers could start using computers for tasks.
  • Large-Scale Integration (Silicon Chips) – Enabled word processing and more advanced software.
  • Interconnected CPUs & GPUs – Widespread computer usage and the rise of the internet.

Notable Figures:

  • Ada Lovelace – Often considered the first female programmer. Her work built upon Babbage’s Analytical Engine.
  • Charles Babbage – Designed the Analytical Engine during the Industrial Revolution to automate calculations. Inspired by the Jacquard loom’s punch cards, which allowed for repeated patterns.
  • Alan Turing – Father of theoretical computer science. Developed the Bombe to crack Enigma codes during WWII. His Turing Machine and Turing Test remain foundational in AI research.

Key Concepts:

  • Binary (Base 2): 0-1 (1 bit)
  • Octal (Base 8): 0-7 (3 bits)
  • Decimal (Base 10): 0-9
  • Hexadecimal (Base 16): 0-F (4 bits)

Data Compression

Reducing storage space needed for data.

  • Lossless Compression – No data is lost during compression.
  • Compression Ratio – How much a file is reduced in size.
  • Bandwidth – The amount of data that can be transferred at a time.

Types of Encoding:

  • Keyword Encoding – Replaces common words with shorter representations (e.g., “and” → “&”).
  • Run-Length Encoding – Compresses repeated characters (e.g., 1111110001111613041).
  • Huffman Encoding – More frequent characters get shorter bit representations.

Digital signals are preferred for data transfer because they are resistant to environmental interference.

Computer Architecture

  • Virtual Machines (VMs): Emulations of physical computers stored as files (“images”).
  • Hypervisor: Software that creates and manages VMs.
  • Cloud Computing: Remote data storage and computing services with concerns over security and third-party reliance.
  • Heap Memory: Used by all parts of a program and dynamically allocated.
  • Stack Memory: Used for local variables and function calls; prone to overflow.

Storage & Hardware

  • Solid State Drives (SSD): Fast, durable, non-mechanical.
  • Hard Disk Drives (HDD): Higher storage capacity, mechanical, cheaper but more fragile.
  • CPU (Central Processing Unit): Executes programs, includes ALU (Arithmetic Logic Unit) and processors.

CS 211 - Object-Oriented Programming

Compilation Process

  • .java files are converted to .class files by the Java compiler.
  • Java does not have an assembler and only requires a compiler to convert .java files into bytecode.
  • The Java Virtual Machine (JVM) generates machine code at runtime, acting as an interpreter.
  • Advantage: Since Java includes its own runtime environment, it can be executed on any device. This was likely one of the primary reasons Java was created.

Tokens in Java

Java has six types of tokens:

  1. Reserved Keywordsif, else, int, for, while
  2. Identifiersx, i, objName, varName
  3. Literals7, "hello", 8.9
  4. Operators+, -, /, *
  5. Separators[], {}, ()
  6. Terminators;

Example:

int count = 7; // reserved, identifier, operator, literal, terminator

Increment Operators

int x = 9;
  • x++ → Evaluates x, then increments it.
    • print(x++)9
    • print(x)10
  • ++x → Increments x, then evaluates it.
    • print(++x)10
    • print(x)10

Variable Types in Java

  • Class Variables: Shared across all objects of the class. Changing one instance affects all others. Declared using the static keyword.
  • Instance Variables: Each object has its own copy. Modifying one object’s variable does not affect others.
  • Local Variables: Exist only within a method. Once the method returns, the variable’s scope ends.

Primitive Data Types & Their Sizes

Integers
  • byte - 1 byte
  • short - 2 bytes
  • int - 4 bytes
  • long - 8 bytes
Floating-Point Numbers
  • float - 4 bytes
  • double - 8 bytes
Characters
  • char - 1 byte
Booleans
  • boolean - 1 byte (size may vary)

Reference Data Types

  • Any non-primitive type or dynamically created object.
  • Includes:
    • Data structures (e.g., stacks, queues)
    • Objects of a class
Autoboxing & Unboxing
  • Autoboxing: Converting a primitive to a reference type.
  • Unboxing: Converting a reference type to a primitive.

Type Conversion & Typecasting

Implicit Conversion

  • Converts to a larger data type (no data loss).
    int x = 4;
    long y = x;
    

Explicit Conversion (Typecasting)

  • Required when converting to a smaller type.
    long x = 4;
    int y = (int) x;
    

Object References & Comparisons

Demo d = new Demo(); // Creates a new object
Demo d1 = d;         // Assigns d1 to reference of d
  • d and d1 refer to the same object.
  • d == d1true

Jagged Arrays

int[][] array = new int[2][];
array[0] = new int[3];
array[1] = new int[4];

Strings in Java

Creating Strings

String s = "hey";
String t = new String("hi");
  • s == t returns false because their references are being compared.

Correct String Comparison

s.compareTo(t) == 0

String Immutability

  • Strings in Java are immutable.
  • Operations like += or " " + " " create a new string object.
  • This can cause performance issues in large applications.
  • Use StringBuilder for efficient string manipulation.

Java I/O

  • System.in
  • System.out
  • System.err

Object-Oriented Programming (OOP) Concepts

Classes & Methods
  • Class: A collection of fields (data) and methods (functions).
  • this Reference:
    • Refers to the actual instance variables of a class.
    • Useful in constructors to differentiate between instance variables and input arguments.
Cascading Constructors
class Person {
    Person() {
        this("hi", 2); // Calls the constructor with two parameters
    }

    Person(String name, int age) {
        this.name = name;
        this.age = age;
    }
}

Instance vs. Static Methods

  • Instance Methods → Require an object to be called.
  • Static Methods → Cannot access instance variables, but can access static variables.

Method Overloading & Overriding

  • Overloading: Same method name, different parameters.
  • Overriding: Same method name, same parameters.

Encapsulation: Access Modifiers

| Modifier | Access Level | |————|————-| | public | Accessible from any class | | private | Accessible only within the same class | | protected| Accessible within the same package and subclasses | | (default) | Accessible within the same package |

Inheritance

  • Defines relationships between parent and child classes (is-a relationship).
  • final class → Cannot be extended.
  • abstract class → Must be extended.

Polymorphism

  • Static Polymorphism → Method overloading.
  • Dynamic Polymorphism → Method overriding.

STAT 344 Probability and Statistics

Statistics measures variability. Probability measures uncertainty. No. I have no idea what that means.

Types of Models:

  • Mechanistic → Based on theory/principles (e.g., gas laws)
  • Empirical → Applied science and math
  • Regressional → Builds on empirical models and offers predictions
  • Probabilistic → Attempts to quantify model errors

Fundamental Concepts:

  • Random Experiment: Same input leads to different outputs.
  • Sample Space: Set of all possible outcomes.
    • Discrete Sample Spaces → Finite, quantifiable outcomes.
    • Continuous Sample Spaces → Infinite outcomes (e.g., number line).
  • Event: A subset of the sample space, typically denoted with a capital letter.

Axiomatic Probability:

  • Axiom/Postulate: A statement assumed to be true.
  • Axiomatic System Features:
    • Consistent → No contradictions.
    • Independent → Axioms cannot be derived from each other.
    • Complete → Everything can be derived from axioms.

Conditional Probability:

[ P(D|F) = \frac{P(A \cap B)}{P(B)} ]
Independence of Events:

  • ( P(A B) = P(A) )
  • ( P(A \cap B) = P(A) \times P(B) )

Bayes’ Theorem

(Came back during NLP Sentiment Analysis)
[ P(A|B) = \frac{P(B|A) \times P(A)}{P(B)} ]

Distribution Functions

Probability distribution → Probabilities associated with specific values of ( x ).

Probability Mass Function (PMF)

Example:

  • ( f(x_i) > 0 ) for discrete values
  • Cumulative Distribution Function (CDF) → Integration of PMF

Properties of the CDF

  • Describes a probabilistic function.
  • Non-decreasing.
  • Defined for all real numbers.
  • As ( x \to +\infty ), ( P = 1 ); as ( x \to -\infty ), ( P = 0 ).

Example Discrete CDF:

( P(X = x) = \frac{1}{6}, ) for ( x \in {1, 2, 3, 4, 5, 6} ).

Piecewise function:
[ F(x) = \begin{cases} 0, & x < 1
1/6, & 1 \leq x < 2
1/3, & 2 \leq x < 3
1/2, & 3 \leq x < 4
2/3, & 4 \leq x < 5
5/6, & 5 \leq x < 6
1, & x \geq 6 \end{cases} ]

Special Discrete Distribution Functions:

  1. Discrete Uniform Distribution
    • Description: Equally likely outcomes over integer range ([a, b]).
    • Expectation: ( E(X) = \frac{a + b}{2} )
    • Variance: ( \frac{(b-a+1)^2 - 1}{12} )
    • Example: Fair die roll ((a=1, b=6), E(X) = 3.5, Var(X) \approx 2.92 ).
  2. Binomial Distribution (Extension of Bernoulli trials with defined trials)
    • Description: Number of successes in ( n ) independent trials.
    • PMF: ( P(X=k) = C(n,k) p^k (1-p)^{n-k} )
    • Expectation: ( np )
    • Variance: ( np(1-p) )
    • Example: 10 coin flips ((p=0.5), E(Heads) = 5, Var(X) = 2.5 ).
  3. Geometric Distribution
    • Description: Trials until first success.
    • PMF: ( P(X=k) = (1-p)^{k-1} p )
    • Expectation: ( 1/p )
    • Variance: ( (1-p)/p^2 )
    • Example: Drawing cards until an ace appears ((p=4/52), E(Trials) = 13, Var(X) = 156).
  4. Negative Binomial Distribution
    • Description: Trials until ( r ) successes.
    • PMF: ( P(X=k) = C(k-1, r-1) p^r (1-p)^{k-r} )
    • Expectation: ( r/p )
    • Variance: ( r(1-p)/p^2 )
    • Example: Sales calls until 3 sales ((p=0.2), E(Calls) = 15, Var(X) = 60).
  5. Hypergeometric Distribution
    • Description: Successes in draws without replacement (e.g., card games).
    • PMF: ( P(X=k) = \frac{C(K,k) C(N-K,n-k)}{C(N,n)} )
    • Expectation: ( n \times (K/N) )
    • Variance: ( n \times (K/N) \times (1-K/N) \times \frac{N-n}{N-1} )
    • Example: 7-card poker hand with 2 aces, ( E(Aces) \approx 0.54 ).
  6. Poisson Distribution
    • Description: Events in a fixed interval with known rate.
    • PMF: ( P(X=k) = \frac{\lambda^k e^{-\lambda}}{k!} )
    • Expectation: ( \lambda )
    • Variance: ( \lambda )
    • Example: Call center receives 5 calls/hour, ( E(Calls \text{ in 2 hrs}) = 10, Var(X) = 10 ).

Continuous Probability Distributions

  1. Continuous Uniform Distribution
    • Description: Similar to discrete uniform, but for continuous values.
    • PDF: ( f(x) = \frac{1}{b-a}, \quad a \leq x \leq b )
    • Expectation: ( (a+b)/2 )
    • Variance: ( (b-a)^2 / 12 )
  2. Normal (Gaussian) Distribution
    • Description: A symmetric distribution centered around the mean.
    • PDF: ( f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-(x - \mu)^2 / (2\sigma^2)} )
    • Expectation: ( \mu )
    • Variance: ( \sigma^2 )
  3. Exponential Distribution
    • Description: Time between events in a Poisson process.
    • PDF: ( f(x) = \lambda e^{-\lambda x}, \quad x \geq 0 )
    • Expectation: ( 1/\lambda )
    • Variance: ( 1/\lambda^2 )
  4. Gamma Distribution
    • Description: Generalization of exponential; models time until the ( k )-th event in a Poisson process.
    • PDF: ( f(x; k, \lambda) = \frac{\lambda^k x^{k-1} e^{-\lambda x}}{\Gamma(k)}, \quad x \geq 0 )
    • Expectation: ( k/\lambda )
    • Variance: ( k/\lambda^2 )

CS 262 - Low-Level Programming in C

PCAL: The Compilation Process

You will likely be quizzed and tested on the PCAL process:

  • Preprocessing: Removes whitespace, comments, expands macros, and includes header files.
  • Compiling: Converts C source files into assembly code.
  • Assembling: Converts assembly into machine (binary) code.
  • Linking: Links object files together into an executable binary.

Use the gcc compiler to compile C files:

gcc hello.c -o hello  # Creates an executable named 'hello'

Basic Program Structure

All C programs begin execution from the main function:

int main() {
    return 0;
}

To print to the console, include the stdio library:

#include <stdio.h>
int main() {
    printf("hello");
    return 0;
}

Constants and Macros

Use #define to define constants and macros:

#define GREETING "hello"

#include <stdio.h>
int main() {
    printf(GREETING);
    return 0;
}

Reading Input

Use scanf() to read user input:

  • %d – Decimal integer
  • %f – Float
  • %e – Exponential
  • %g – Compact exponential/float
scanf("%d", &x);         // Reads integer into x
scanf("%d,%f", &x, &y);  // Reads integer and float

Data Type Conversions

  • int + float results in a float
  • int / int truncates the decimal (integer division)
  • int i = 33.4f; will truncate to 33

Comparison chaining like i < j < k does not work as expected.

Booleans

C uses integers (0 = false, non-zero = true). Use stdbool.h for true and false keywords, but honestly, you can just use 0’s and 1’s and it works the same.

Function Parameters

  • Pass by value: Copies variable; original is unchanged.
  • Pass by reference: Pass pointer to variable; original can be modified.

C defaults to pass-by-value. Use pointers to simulate pass-by-reference.

Variable Scope

  • extern: Declares a global variable defined elsewhere.
  • static: Keeps variable persistent across function calls.

Arrays

int array[10] = {0};                    // All elements set to 0
int array[10] = {1, 2, 3};              // Remaining values default to 0
int array[10] = {[2] = 3, [7] = 4};     // Sparse initialization
int length = (int)(sizeof(array) / sizeof(int));

Function Declaration

<return-type> functionName(<parameters>) {
    // function body
    return <value>;  // Default return type is int
}

Arrays cannot be returned directly, but can be passed as parameters.

Exit

#include <stdlib.h>
exit(0);  // Terminates the program

Pointers

POINTERS — honestly one of the most important concepts in C.

Pointers store memory addresses:

int a = 5;
int *p = &a;  // p holds address of a
*p = 10;      // modifies value of a to 10
  • const int *p: Cannot modify value pointed to
  • int *const p: Cannot change address stored

Pointers & Arrays

Pointers and arrays are closely related, primarily due to memory’s structure as one big array.

You can use pointers instead of array indexing to retrieve a value at each index:

int a[10];
int *p = &a[2];  // Points to index 2
int *q = p + 3;  // Points to index 5

Multidimensional arrays increment column-first when pointer is used externally.

int a[5][5];  // a → a[0]

Strings

String literals are char pointers:

char *s = "abc";

Use <string.h> for string functions:

  • strlen(str)
  • strcat(str1, str2)

Multifile Compilation

#include <filename>   // System headers
#include "filename"   // User-defined files

Gives your file access to functions and variables from other files. Creates a class-like feel to your code.

Preprocessor Directives

Preprocessors exist for:

  • File inclusions (see above)
  • Macro definitions:
    #define SPEED_OF_LIGHT 2.9
    #define IS_EVEN(n) (n % 2 == 0)
    #define HI(...) (printf(#_VA_ARGS_))
    
  • Conditional compilation:
    #ifdef CROSS_COMPILE
    // only compiled for BeagleBone
    #endif
    

File I/O

FILE *fp = fopen("in.dat", "r");
fclose(fp);

Modes:

  • r – read
  • w – write (overwrites)
  • a – append

Functions:

  • fread(), fwrite()
  • fseek(), SEEK_SET, SEEK_END, SEEK_CUR

Dynamic Memory

Another really important concept in 262!!

You can claim memory from the heap directly in C:

  • malloc(size) – Allocates memory block
  • calloc(n, size) – Allocates and clears memory
  • realloc(ptr, new_size) – Resizes memory
  • free(ptr) – Frees memory to avoid leaks

C has no garbage collector — you must manually free what you allocate.

Structs

Group variables under one name:

struct Person {
    char name[50];
    int age;
    float salary;
};

struct Person p;
p.age = 5;
strcpy(p.name, "Bob");

With typedef:

typedef struct {
    char name[50];
    int age;
    float height;
} Person;

Person p1;
p1.age = 30;

Structs can be passed to and returned from functions. You can also have nested structs.

Unions

Similar to structs but all members share memory:

union {
    int i;
    double d;
} u;

Saves memory, but changing one affects all others. Don’t use often — makes code hard to read.

Enums

enum Suit { CLUBS, SPADES };

Just like Java enums. Assigns integer values to names.

Linked Lists

You’ll probably implement this in like 3-4 classes (262, 367, 471, etc) so honestly implementing a linked list becomes second nature

struct Node {
    int data;
    struct Node* next;
};

struct Node* newNode = (struct Node*)malloc(sizeof(struct Node));
newNode->data = value;
newNode->next = NULL;

Always use pointers in linked lists because the memory is dynamically allocated.

Bitwise Operations

Operate directly on bits:

int i = 4;   // 0100
i << 1;      // 1000 → 8
i >> 2;      // 0001 → 1

&  AND
|  OR
^  XOR
~  NOT

CS 310 - Data Structures

Time vs Space Complexity

  • Time complexity – How long it takes for an algorithm to run.
  • Space complexity – How much space is occupied by the algorithm

Trade-offs

  • When is it better to sacrifice time or storage?
  • Typically depends on readability, long-term use case, and how important latency is to the consumer

Dynamic Arrays

  • Arrays without a fixed size; you can add any number of elements.
  • Creation: Since arrays are of fixed size in low-level languages, to implement a dynamic array you have to:
    • Start with an initial size (user-defined or arbitrary)
    • As more values are added and you run out of space, create a new array with double the size
    • Shift all elements from the old array into the new one (an expensive operation, but not frequent)
  • Deletion:
    • To delete a particular index, remove the element and shift everything after it to the left
    • If the array becomes less than 1/3 full, you can consider shrinking (halving) the array to save space

Linked Lists

  • A data structure in which objects which are not sequentially stored in memory can be accessed by creating a node that stores a pointer/reference to the next object.
  • Helpful for memory because now consecutive chunks of memory are no longer required, making it easier to create and delete objects dynamically.

Creation

  • In Java, you can create objects for each item in the list.
  • In C/C++, use structs.
  • Typically each node has 2-3 fields: one for the value, one or two for the next/previous pointers.

Types

  • Singly Linked List: Only pointer to the next node; supports one-way traversal.
  • Doubly Linked List: Pointers to both next and previous nodes; easier to traverse and modify.
  • Circular Linked List: Tail points back to the head; forms a loop.

Stacks and Queues

Stack (LIFO)

  • First in, last out.

Queue (FIFO)

  • First in, first out.

Implementation

  • Can be implemented with arrays or linked lists.
  • Linked list implementation is often better for queues because it avoids shifting.
  • Array implementation typically uses two pointers (start and end).
  • When the end pointer gets to the end of the array, and the start pointer is not at index 0, it wraps around (circular buffer).
  • If the queue needs to be expanded, the size is doubled. Values are re-added starting from the start pointer to the end of the original array, and then from index 0 to the end pointer.

Priority Queue

  • Special case where the highest (or lowest) priority element is removed first.
  • Typically implemented with a min or max heap.
  • Order is based on value, not insertion time.

Hashmaps / Hash Tables

  • Solve the problem of search time.
  • For other data structures, lookup is O(n); with hashmaps, it’s O(1) on average.

Implementation

  • Hash tables are implemented as arrays under the hood.
  • A hash function maps a key to an index:
    index = value % len_array  # simple but not great hash function
    
  • The value is stored at that index.

Collision Handling

What if two values map to the same index?

  1. Separate Chaining:
    • The array stores pointer objects (like linked lists).
    • Multiple values that hash to the same index are added to the list.
    • Downside: slower lookups due to list traversal and potential for all values to cluster in one index.
    • A fix: track the longest list; if it exceeds a threshold, rehash with a new function or bigger array.
  2. Linear Probing:
    • If index is taken, look at the next available index (index + 1, +2, …).
    • Pros: avoids wasting space; Cons: clustering and longer probe times.
  3. Quadratic Probing:
    • Similar to linear probing, but the index jump increases quadratically: index + n^2
  4. Double Hashing:
    • Apply another hash function if a collision occurs, and retry until an empty spot is found.

C++ uses a combination of separate chaining with open addressing (linear/quadratic probing).

Eventually, as the table gets filled up, a new hash function is needed and the array has to be extended.


Graphs

  • Graphs are made up of vertices (nodes) and edges (arcs).
  • Edges are defined by vertex pairs.
  • If the pairs are ordered (e.g., [1,2] ≠ [2,1]), the graph is directed (directional edges).

Terminology

  • Path: Sequence of vertices with a valid edge between each pair.
  • Simple Path: No repeated vertices.
  • Cyclic: Start and end vertex are the same.
  • Weighted Path: Edges have associated “costs”.

Representations

Adjacency Matrix
  • nxn matrix where n is the number of vertices.
  • Rows = starting vertex, columns = ending vertex.
  • Best for dense graphs but uses a lot of space.
    |     | A | B | C | D | E |
    |-----|---|---|---|---|---|
    | A   | 0 | 1 | 1 | 1 | 0 |
    | B   | 1 | 0 | 0 | 0 | 1 |
    | C   | 1 | 0 | 0 | 0 | 1 |
    | D   | 1 | 0 | 0 | 0 | 0 |
    | E   | 0 | 1 | 1 | 0 | 0 |
    
Adjacency List
  • Each index represents a vertex.
  • Values are a list of vertices it’s connected to.
  • Great for sparse graphs (which is often the case).
    A: B, C, D
    B: A, E
    C: A, E
    D: A
    E: B, C
    

Graph Traversals

  • BFS: Queue system; level-order traversal.
  • DFS: Stack or recursion; post-order style traversal.

Minimum Spanning Tree (MST)

A tree that connects all vertices with the minimum number of edges.

  • Cannot contain cycles or be disconnected.
  • If using weights, it should have the lowest total weight.
  • A graph can have multiple MSTs; if all weights are unique, there is only one.

Prim’s Algorithm

  1. Eliminate self-loops and duplicate edges (keep the least weight).
  2. Pick a root node.
  3. Pick the least weight edge from the vertex to any unvisited vertex.
  4. Continue picking the minimum edge that connects to the growing tree.
  5. Repeat until all vertices are visited.

Kruskal’s Algorithm

  1. Eliminate self-loops and duplicates.
  2. Sort all edges in increasing order.
  3. Keep connecting edges (in order) as long as they don’t form a cycle.
  4. Stop once MST is formed.

Topological Sort

  • Can only be performed on a Directed Acyclic Graph (DAG).
  • Common example: course scheduling.
  1. For each vertex, track number of inbound edges (in-degree).
  2. Find all vertices with in-degree 0; add them to result list.
  3. Decrease in-degrees of connected vertices.
  4. Repeat until all vertices are processed.

Heaps (Min/Max)

  • Can be implemented as trees.
  • Pain to insert/delete because of constant balancing.
  • Min heap: smallest element at the top.
  • Max heap: largest element at the top.
  • Used in priority queues.
  • Honestly, you don’t need to know much detail about this.

Trees

  • Trees represent hierarchical structures.
  • Binary Tree: Each node has ≤ 2 outbound edges and 1 inbound edge.
  • Leaf nodes have no children.
  • Root node has no parent.
  • Usually implemented as linked lists with left/right child pointers. Can also be implemented using arrays.

Types of Binary Trees

Full/Proper Binary Tree

Each node has either 0 or 2 children.

      A
     / \
    B   C
   / \
  D   E
Complete Binary Tree

All levels are fully filled except the last, which is filled left to right.

      A
     / \
    B   C
   / \ /
  D  E F
Perfect Binary Tree

Both full and complete.

      A
     / \
    B   C
   / \ / \
  D  E F  G
Degenerate Tree

All internal nodes have one child.

A
 \
  B
   \
    C
   /
  D

Tree Traversals (using linked list nodes)

Inorder: left, root, right

def inorder(node):
    if node is None:
        return
    inorder(node.left)
    print(node.val)
    inorder(node.right)

Postorder: left, right, root

def postorder(node):
    if node is None:
        return
    postorder(node.left)
    postorder(node.right)
    print(node.val)

Preorder: root, left, right

def preorder(node):
    if node is None:
        return
    print(node.val)
    preorder(node.left)
    preorder(node.right)

Constructing a Tree from Array Representation

Given Preorder + Inorder → build from the left

Preorder:   [A, B, D, E, C, F, G]
Inorder:    [D, B, E, A, F, C, G]
      A
     / \
    B   C
   / \ / \
  D  E F  G

Given Postorder + Inorder → build from the right

Postorder:  [D, E, B, F, G, C, A]
Inorder:    [D, B, E, A, F, C, G]
      A
     / \
    B   C
   / \ / \
  D  E F  G

Other Tree Types (Rarely Tested in Detail)

  • Binary Search Tree (BST): All nodes to the left of the root are less than the root.
  • AVL Tree: A BST that can self-balance when one side becomes deeper.
  • Red-Black Tree: A more complex self-balancing BST. (You might have 5 pages of notes on this – likely won’t use them.)
  • B Tree: Balanced m-way tree. Binary trees have 2 children; B trees can have many. Each node can hold multiple values.
  • B+ Tree: Like a B Tree, but all data is in the leaf nodes only.

Here’s a table comparing all the data structures.

Data Structure Access Insert Delete Search Notes
Array O(1) O(n) O(n) O(n) Fixed size (in C), insert/delete requires shifting
Dynamic Array O(1) O(1)* O(n) O(n) *Amortized insert at end, resizing occasionally
Singly Linked List O(n) O(1)** O(1)** O(n) **O(1) at head, O(n) at tail or specific index
Doubly Linked List O(n) O(1)** O(1)** O(n) Better for bidirectional traversal
Stack (Array/LL) O(n) O(1) O(1) O(n) LIFO: push/pop at top
Queue (Array/LL) O(n) O(1) O(1) O(n) FIFO: enqueue at rear, dequeue at front
Deque O(1) O(1) O(1) O(n) Double-ended queue, insert/delete both ends
Hash Table O(1) O(1) O(1) O(1) O(n) worst-case due to collisions, requires hashing
Binary Search Tree O(log n) O(log n) O(log n) O(log n) Must be balanced (e.g., AVL, Red-Black)
AVL Tree O(log n) O(log n) O(log n) O(log n) Self-balancing BST
Heap (Min/Max) O(n) O(log n) O(log n) O(n) Efficient for priority queue, top is min/max
Trie (Prefix Tree) O(k) O(k) O(k) O(k) k = length of string; used for autocomplete, dictionaries
Graph (Adj List) O(1) O(1) O(1) O(V + E) Good for sparse graphs
Graph (Adj Matrix) O(1) O(1) O(1) O(V²) Good for dense graphs

CS 330 - Formal Methods and Models

Propositional Logic

Propositions are statements that are either true or false (but not both).

  • Examples:
    • “It is raining.” (True/False)
    • “3 + 4 = 7” (True)

If a and b are propositions, then:

  • a ∨ b (logical OR) is also a proposition.

Logical Equivalence: Two propositions are logically equivalent if their truth tables are the same.

Tautology: A proposition that is always true. Contradiction: A proposition that is always false.

Advanced Example: Show that ((p ∧ q) → r) ≡ (¬r → ¬(p ∧ q))

  • Use contrapositive: p → q¬q → ¬p
  • Hence, equivalent by contrapositive.

Set Theory

  • A set is a collection of distinct elements.
    • Extensional: {2, 3, 4}
    • Intensional: {x | x < 10}

Notation:

  • = Empty set
  • x ∈ S = x is an element of set S
  • R ⊆ S = R is a subset of S (may be equal)
  • T ⊂ S = T is a proper subset (T ≠ S)
  • |S| = Cardinality (number of elements)
  • 2^S = Powerset (set of all subsets of S)

Advanced Example: Let A = {1, 2} and B = {x | x < 4}. Then 2^A = {∅, {1}, {2}, {1, 2}}, and the intersection of A ∪ B = {1, 2, 3}

Functions:

  • Total function: Every input has a mapped output.
  • Partial function: Some inputs may not have outputs.

Logical Operators

  • ¬ or ~ → NOT
  • → AND
  • → OR
  • → Implication (if…then)
  • → Bi-conditional (if and only if)

Important Logical Laws:

  • Contrapositive: p → q is equivalent to ¬q → ¬p
  • Conditional Law: p → q¬p ∨ q
  • Distributive: p ∧ (q ∨ r)(p ∧ q) ∨ (p ∧ r)
  • Subsumption: p ∧ (p ∨ q)p
  • De Morgan’s Laws:
    • ¬(p ∧ q)¬p ∨ ¬q
    • ¬(p ∨ q)¬p ∧ ¬q

Truth Table Example

| p | q | p → q | ¬q | ¬q → ¬p | |—|—|——–|—-|———-| | T | T | T | F | T | | T | F | F | T | F | | F | T | T | F | T | | F | F | T | T | T |

Demonstrates equivalence of implication and contrapositive.


Proofs and Reasoning

  • A proof is a sequence of logically valid steps to demonstrate the truth of a statement.
  • Theorem: Proven statement
  • Axiom: Accepted as true without proof
  • Inference rules: Methods to derive conclusions
  • Assumptions: Temporary truths to explore implications (shown in square brackets: [p])

Common Rules:

  • Modus Ponens: From p and p → q, conclude q
  • Modus Tollens: From p → q and ¬q, conclude ¬p

Example Proof:

[p → q]    (Assumption)
[p]        (Assumption)
[q]        (From 1, 2 by Modus Ponens)

Advanced Example: Prove: ((p → q) ∧ (q → r)) → (p → r)

  1. Assume (p → q) and (q → r)
  2. Assume p
  3. From p → q and p, conclude q
  4. From q → r and q, conclude r
  5. Therefore, p → r

Predicate Logic

  • A predicate is a function whose output is either true or false.
    • n > 3 is a predicate.
    • 5 > 3 is a proposition (concrete value).

Truth Set: The set of all values that make a predicate true.

Quantifiers:

  • Universal Quantifier: ∀x P(x) — for all x, P(x) holds
  • Existential Quantifier: ∃x P(x) — there exists an x such that P(x)

Negation:

  • ¬∃x P(x)∀x ¬P(x)
  • ¬∀x P(x)∃x ¬P(x)

Example:

  • ∀x (x > 0 → x^2 > 0)
  • ∃x (x^2 = 4) ⇒ x could be 2 or -2

Sequences and Program Verification

  • A sequence is an ordered collection of elements (repeats allowed).

Program Verification (Partial Correctness):

  • Prove that the output is correct if the program terminates.
  • Hoare Triple: {P} C {Q} where:
    • P is the precondition
    • C is the code/command
    • Q is the postcondition

Example:

{even(x)} x = x + 3; {¬even(x)}

Advanced Example:

{x = n ∧ n ≥ 0}
sum = 0;
i = 1;
while (i ≤ n) {
    sum = sum + i;
    i++;
}
{sum = n(n+1)/2}

Normal Forms

  • Disjunctive Normal Form (DNF): OR of ANDs
    • (p ∧ q) ∨ (¬p ∧ r)
  • Conjunctive Normal Form (CNF): AND of ORs
    • (p ∨ q) ∧ (¬p ∨ r)

Formal Languages

  • Language: A set of strings
  • Alphabet (Σ): Set of symbols (e.g., {a, b})
  • Σ*: Set of all possible strings using the alphabet
  • L ⊆ Σ*: Language is a subset of all strings

Examples:

  • Alphabet: {a, b}
  • String: abba
  • Language: {a, ab, abb}
  • Empty string: ε (also a valid string in many languages)

Advanced Example:

  • Language: All strings over {a, b} where the number of a’s is even and total length is divisible by 3

Regular Languages and Regex

Regular Languages are defined by finite state automata and closed under union, concatenation, and Kleene star.

Examples:

  • Strings ending in “ab”
  • Strings with even number of 0s: regex = (00)*

Regex Examples:

  • (a|b)*: All strings of a’s and b’s
  • a*b+: One or more b’s preceded by zero or more a’s

Advanced Regex:

  • Strings that contain “aba” as a substring: .*aba.*

Regular Grammars

Productions:

  • X → aX, X → a, X → ε

Example (basic): Strings ending in “a”:

S → aS | bS | a

Advanced Example: Strings with at least one a followed by one b:

S → aA
A → bB | aA
B → aB | bB | ε

Finite Automata (DFA)

DFA Example: Strings ending in “ab”

  • States: q0 (start), q1, q2 (accept)
  • Transitions:
    • q0 --a--> q1
    • q1 --b--> q2

Example: Accept strings with even number of a’s and odd number of b’s

  • Need 4 states to track parity of a and b

NFA (Non-deterministic Finite Automata)

NFA Example: Accept strings with a substring “aba”

  • Transitions:
    • q0 --a--> q1
    • q1 --b--> q2
    • q2 --a--> q3 (accept)
    • Also include q0 transitions to self on any character to allow skipping

Pushdown Automata (PDA)

Used for languages like balanced parentheses:

PDA Example: Recognize L = { a^n b^n | n ≥ 0 }

  • Push an a on the stack for each input a
  • Pop an a from stack for each input b
  • Accept if stack is empty at end

Advanced PDA Example: L = { a^n b^m c^n | n, m ≥ 0 }

  • Push for a, ignore b, pop for c
  • Accept on empty stack

Context-Free Grammars (CFG)

CFGs can express nested patterns like palindromes and balanced expressions.

Example:

S → aSb | ε

Generates strings like: “”, “ab”, “aabb”, “aaabbb”, etc.

Advanced CFG: Palindromes over {a, b}:

S → aSa | bSb | a | b | ε

Turing Machines

Turing machines use an infinite tape to simulate any computation.

Basic Example: Accept strings with equal number of as and bs

  • Scan tape marking a with X, search and mark b with Y, repeat

Advanced Example: Binary incrementer

  • Input: Binary number like 1101
  • Operation: Increment by 1 → 1110
  • Turing Machine walks right, flips trailing 1s to 0s, then flips first 0 to 1

CS 484 - Data Mining

Note: I feel like they try to scare you away in this class with a lot of math, but push through it — I promise it becomes more manageable later. You’re allowed a cheat sheet, so just write down all the math. I high-key forgot half of linear algebra by the time I took this class. My notes don’t have too much math, probably because I myself didn’t understand it.


Data Mining

Data Mining is the process of discovering patterns and anomalies in large datasets.

Data = collection of data objects

  • Discrete attributes: Categorical values
    • Nominal: no order (e.g., gender, color)
    • Ordinal: ordered (e.g., star ratings)
  • Continuous attributes: Numeric
    • Interval (e.g., temperature)
    • Ratio (e.g., weight)

Data Cleaning

Raw data isn’t ready for machine learning. It often has:

  • Missing values
  • Duplicates
  • Non-numerical data (qualitative)
  • Skewed distributions

Solutions include:

  • Replace missing values with mean, median, or mode
  • Merge or remove duplicates
  • One-hot encoding/vectorization for categorical variables
  • Normalize/standardize data (e.g., z-score normalization)
  • Dimensionality reduction to remove redundancy

K-Nearest Neighbors (KNN)

KNN is a lazy learning algorithm that classifies a point based on its k nearest labeled neighbors.

  • Time Complexity: O(nD + nK)
  • Hyperparameters:
    • k: Number of neighbors
    • Distance metric (Euclidean, Manhattan, Cosine)

Effects of k:

  • Small k → overfitting
  • Large k → underfitting

Variants:

  • Condensed KNN: Keeps only decisive points (usually boundary cases)
  • Distance weighting: Closer points are weighted more heavily

Validation Techniques

  • Validation Set: Separate data for model evaluation
  • Cross-Validation: Split training data into k folds; each fold used once for validation

Model Evaluation: Confusion Matrix

Binary Classification Example

Suppose we have the following results from a binary classifier:

  • True Positives (TP): 50
  • False Negatives (FN): 10
  • False Positives (FP): 5
  • True Negatives (TN): 100
  Predicted Positive Predicted Negative
Actual Positive TP = 50 FN = 10
Actual Negative FP = 5 TN = 100

Calculations:

  • Recall (Sensitivity): Recall = TP / (TP + FN) = 50 / (50 + 10) = 50 / 60 = 0.833

  • Precision: Precision = TP / (TP + FP) = 50 / (50 + 5) = 50 / 55 = 0.909

  • Accuracy: Accuracy = (TP + TN) / Total = (50 + 100) / (50 + 100 + 5 + 10) = 150 / 165 = 0.909

  • F1 Score: F1 = 2 * (Precision * Recall) / (Precision + Recall) = 2 * (0.909 * 0.833) / (0.909 + 0.833) = 2 * 0.757 / 1.742 ≈ 0.87

ROC & AUC

  • ROC Curve: Plots TPR vs FPR
  • AUC (Area Under Curve): Indicates model’s ability to distinguish between classes. 1 = perfect, 0.5 = random guessing

Multi-Class Classification Example

Suppose we have a 3-class problem:

Class TP FP FN
A 40 10 5
B 30 15 10
C 50 5 20

Per-Class Metrics:

  • Class A:
    • Precision = 40 / (40 + 10) = 0.80
    • Recall = 40 / (40 + 5) = 0.89
  • Class B:
    • Precision = 30 / (30 + 15) = 0.67
    • Recall = 30 / (30 + 10) = 0.75
  • Class C:
    • Precision = 50 / (50 + 5) = 0.91
    • Recall = 50 / (50 + 20) = 0.71

Macro-Averaged Metrics:

  • Macro Precision: (0.80 + 0.67 + 0.91) / 3 = 0.793
  • Macro Recall: (0.89 + 0.75 + 0.71) / 3 = 0.783
  • Macro F1 Score:
    • F1_A = 2 * (0.80 * 0.89) / (0.80 + 0.89) = 0.84
    • F1_B = 2 * (0.67 * 0.75) / (0.67 + 0.75) = 0.71
    • F1_C = 2 * (0.91 * 0.71) / (0.91 + 0.71) = 0.80
    • Macro F1 = (0.84 + 0.71 + 0.80) / 3 = 0.783

Micro-Averaged Metrics:

  • Total TP = 120, FP = 30, FN = 35
  • Micro Precision = 120 / (120 + 30) = 0.80
  • Micro Recall = 120 / (120 + 35) ≈ 0.774
  • Micro F1 = 2 * (0.80 * 0.774) / (0.80 + 0.774) ≈ 0.787

PCA (Principal Component Analysis)

Goal: Reduce dimensionality while preserving variance

PCA Steps:

  1. Standardize the data (subtract mean, divide by std deviation)
  2. Compute the covariance matrix S = (1/n) * X^T * X
  3. Compute eigenvalues and eigenvectors of the covariance matrix
  4. Sort eigenvalues in descending order and pick top k eigenvectors
  5. Project the original data X onto the new subspace using the selected eigenvectors

Worked Example: Suppose we have 2D points: (2, 0), (0, 2), (3, 1), (1, 3)

  1. Standardize → subtract mean: mean_x = 1.5, mean_y = 1.5
  2. Centered data: (0.5, -1.5), (-1.5, 0.5), (1.5, -0.5), (-0.5, 1.5)
  3. Covariance matrix S =
    [[1.67, -0.83],
     [-0.83, 1.67]]
    
  4. Eigenvalues: λ1 = 2.5, λ2 = 0.83 → Keep the one with 2.5
  5. Project data onto principal direction (eigenvector of 2.5)

Decision Trees

Used for classification by creating splits on attributes.

How to choose split?

  • Gini Index: Gini(D) = 1 - Σ (p_i)^2
  • Information Gain: IG(D, A) = Entropy(D) - Σ (|D_i| / |D|) * Entropy(D_i)

Gini Index Example: Consider a binary dataset with 6 samples: 4 Positive, 2 Negative

Gini = 1 - (4/6)^2 - (2/6)^2 = 1 - 0.444 - 0.111 = 0.445

Split on an attribute A results in:

  • Left split: 2P, 1N → Gini = 1 - (2/3)^2 - (1/3)^2 = 0.444
  • Right split: 2P, 1N → same Gini = 0.444 Overall Gini = 0.444

Information Gain Example: Original entropy:

Entropy = - (4/6)log2(4/6) - (2/6)log2(2/6) ≈ 0.918

If attribute split gives:

  • Left: 2P, 1N → Entropy = 0.918
  • Right: 2P, 1N → same Entropy IG = 0.918 - (3/6)0.918 - (3/6)0.918 = 0 So no gain — try different attribute

Linear Regression

Goal: Predict a continuous output from a linear combination of features. Essentially a glorified y = mx + b, but with as many variables (the x) as needed to represent all the features. The main goal is to find weight vectors (the m) which will minimize the over rss.

Model:

y = w_0 + w_1x_1 + w_2x_2 + ... + w_nx_n

Loss Function (RSS):

RSS = Σ(y_i - ŷ_i)^2 = Σ(y_i - (w^T x_i))^2

Gradient Descent Update Rule:

w_j = w_j - α * ∂RSS/∂w_j = w_j - α * Σ(2 * (y_i - ŷ_i) * -x_ij)

Step size is an important hyperparameter because too small and the learning is too slow, but too big of a step will cause a lot of bouncing between having to go forward and backward.

Stopping rule: if validation is not changing more than some number, n, stop training.

Example: Dataset: {(x=1, y=2), (x=2, y=4)} Start with: w = 0, b = 0 Predictions: ŷ = 0 Error: e = y - ŷ = [2, 4] Gradient: ∂RSS/∂w = -2 * (21 + 42) = -20 Update: w = 0 + α * 20

Regularization:

  • L1 (Lasso): λ * Σ|w_i| → sparse weights – good for feature selection
  • L2 (Ridge): λ * Σw_i^2 → smaller weights – decreases outlier sensitivity

Logistic Regression

Used for binary classification because the sigmoid function is bound by 0’s and 1’s

Bernoulli Model:

P(y|x) = σ(w^T x) = 1 / (1 + e^{-w^T x})

Loss Function (Binary Cross Entropy):

L = -Σ [y_i * log(p_i) + (1 - y_i) * log(1 - p_i)]

Multiclass (Softmax): The output of softmax is a list of probabilities summing to 1. The number which is the highest is the most likely class it belongs to.

P(y=j|x) = e^{z_j} / Σ e^{z_k}

Cross-Entropy Loss (Softmax):

L = -Σ y_ij * log(softmax_j)

Gradient:

∂L/∂w_j = Σ (p_j - y_j) * x

Perceptrons and Neural Networks

Perceptron: A single-layer neural network that takes in multiple inputs, applies weights, sums them, and uses an activation function to produce a binary output.

  • Training: Perceptrons learn by adjusting weights after each sample. It compares the predicted output with the true label, calculates the error, and updates the weights using the perceptron learning rule.
  • Update rule:
    w = w + α * (y - ŷ) * x
    

Example: Suppose we want a perceptron that performs logical AND:

  • Input: x1, x2 ∈ {0,1}
  • Weights: w1 = 1, w2 = 1, bias = -1.5
  • Activation: output = 1 if (x1w1 + x2w2 + bias) > 0, else 0

Truth table:

  • x1=0, x2=0 → output = 0
  • x1=1, x2=0 → output = 0
  • x1=0, x2=1 → output = 0
  • x1=1, x2=1 → output = 1

Multilayer Perceptron (MLP):

  • A type of feedforward neural network with one or more hidden layers.
  • Forward propagation: Moves information forward through layers.
  • Backward propagation: Sends error backward and updates weights using gradients.
  • Hidden layers:
    • Capture complex, non-linear relationships
    • Automatically learn feature combinations and perform dimensionality reduction
  • Increasing neurons allows learning more complex patterns
  • Increasing hidden layers increases the abstraction and hierarchy in the model

Too many neurons/layers → risk of overfitting. Too few → underfitting.


Activation Function Options:

  • Sigmoid: 1 / (1 + e^{-x}) — Not recommended in hidden layers due to vanishing gradient problem, but fine in output layers for binary classification.
  • Tanh: Similar to sigmoid but ranges [-1,1] — zero-centered.
  • ReLU: max(0, x) — Default choice for hidden layers; efficient and helps convergence.
  • Leaky ReLU: Outputs a small negative value for negative inputs instead of zero.
  • SELU: Scaled ELU — good for fully connected layers and self-normalizing networks.

Dropout Layers:

  • Each neuron has a probability of being ignored (set to 0) during training.
  • Used to prevent overfitting and improve generalization.
  • Only active during training, not testing.

Block Feature Vectors:

  • Used especially for text — splits input into fixed-sized blocks and processes each independently.

Generative vs. Discriminative Models:

  • Generative: Model the full data distribution P(x, y) to understand and generate data (e.g., Naive Bayes, GANs).
  • Discriminative: Focus only on P(y|x) — learning boundaries between classes (e.g., logistic regression, SVM).

Deep Learning:

  • More neurons + more layers = deeper models
  • Necessary for feature abstraction and complex data types like text, speech, or images

Convolutional Neural Networks (CNNs):

  • Specially designed for image and grid-based data.
  • Input: Raw image (matrix of pixels)
  • Convolutional filter: Learns local patterns like edges, corners
  • Feature map: Output of applying a filter
  • Activation function: Adds non-linearity to feature maps
  • Pooling layer: Downsamples feature maps, often with max pooling
  • Flatten: Converts pooled maps into a 1D vector for fully connected layers

Gradient Optimizers:

  • SGD: Stochastic Gradient Descent — simple and widely used
  • Adagrad: Adapts learning rate per feature; helps with sparse data
  • RMSprop: Adjusts learning rate using a moving average of gradients
  • Adam: Combines RMSprop and momentum; widely used for deep learning

Batch Normalization:

  • Normalizes activations in hidden layers
  • Helps mitigate exploding/vanishing gradients
  • Stabilizes and accelerates training

MLP Architecture:

  • Input → Hidden Layers → Output
  • Use activation functions like ReLU, sigmoid, tanh

Backpropagation:

  • Compute error from output
  • Propagate error backward through layers
  • Update weights using gradient descent

Dropout: Randomly drop neurons during training to reduce overfitting.


Recommender Systems

*Collaborative Filtering:

  • Item-based filtering : Predict rating of user u on item i:
    r̂_ui = r̄_u + (Σ sim(u,v) * (r_vi - r̄_v)) / Σ |sim(u,v)|
    
  • User-based: Predicts rating of user u on another user u1

Computation heavy! Henvce matrix factorize which decomposes matrix into 2 smaller ones and creates latent factors which are not explicitly defined but inferred from the data.

Matrix Factorization:

  • Approximate R ≈ U * V^T
  • Optimize latent features U and V using SGD
  • Loss: (R - UV^T)^2 + λ (||U||^2 + ||V||^2)

Example: User-Item ratings:

    M1 M2 M3
U1:  5  ?  3
U2:  4  2  ?

After factorization:

  • Estimate U1’s missing rating using dot product of U1-row and M2-column in factorized matrices

Association Rule Mining

Terms:

  • Support: Fraction of transactions with both A and B
  • Confidence: P(B A) = support(A and B) / support(A)

Apriori Principle: All subsets of a frequent itemset must be frequent.

Example: Transactions:

  1. {milk, bread}
  2. {milk, butter}
  3. {bread, butter}
  4. {milk, bread, butter}

Support:

  • support({milk, bread}) = 2/4 = 0.5 Confidence:
  • confidence(milk → bread) = support(milk & bread) / support(milk) = 0.5 / 0.75 = 0.67

Clustering

K-means Clustering:

  1. Choose k initial centroids
  2. Assign each point to the nearest centroid
  3. Update centroids as the mean of points in each cluster
  4. Repeat until convergence

Example: Data: [1, 2, 10, 11] Initial centroids: 1 and 10 Iteration 1: [1,2] to Cluster 1, [10,11] to Cluster 2 New centroids: 1.5 and 10.5 → converged

Hierarchical Clustering:

  • Agglomerative: Merge closest clusters iteratively
  • Divisive: Split cluster until each point stands alone

DBSCAN:

  • Parameters: eps and minPts
  • Classify core, border, and noise points
  • Example: Points (1,2,3,10,11); eps=2, minPts=2
    • Core: {1,2,3}, Cluster 1
    • Core: {10,11}, Cluster 2

Clustering Evaluation

SSE (Sum of Squared Errors): Measures cohesion SSB (Between groups): Measures separation Homogeneity: All clusters contain only members of a single class Completeness: All members of a given class are in the same cluster V-measure: Harmonic mean of homogeneity and completeness

CS 478 - Natural Language Processing

Logistic Regression —

Sentiment analysis problem – binary classification (is this tweet positive or negative?).

Feature Extraction: Create a large vector containing every single word in the vocabulary and assign value 1 if the word is present in the tweet or 0 if not.

  • Problem: Very sparse vector since most tweets will not have many overlapping words, resulting in slow prediction.

Alternate Strategy:

  • Create classes with positive and negative word connotations.
  • For a given tweet, count the number of positive and negative words. Whichever class has more words determines the sentiment.

Preprocessing a Tweet:

  • Remove common words (“a”, “the”, etc.)
  • Remove punctuation
  • Remove Twitter handles and URLs

Bag of Words Approach:

  • Convert all words to lowercase
  • Perform stemming (e.g., “stopped” → “stop”)
  • Compare each word to a positive/negative dictionary and count occurrences

Example:

  • Tweet: “I loved the new Batman movie! Absolutely amazing.”
  • After cleaning: [“love”, “new”, “batman”, “movie”, “absolutely”, “amazing”]
  • Match with dictionary: 3 positive words → classify as positive sentiment

Text Normalization

  • Tokenization – Identify words in a sentence
  • Lemmatization – Normalize words to their root form (“am”, “are”, “is” → “be”)
  • Stemming – Remove suffixes (“happily” → “happi”)
  • Segmentation – Break text into sentences

Problems with Tokenization:

  • Hyphens
  • Acronyms
  • Contractions
  • Multi-word units (“New York”)

Ambiguity:

  • Prepositional phrase attachment is confusing (“I saw the man with a telescope”)

Byte Pair Encoding (BPE)

Start with every character as a token and build a dictionary by:

  1. Merging adjacent tokens
  2. Tracking frequency
  3. Keeping merges that occur above a frequency threshold
  4. Repeating for groups of 3, 4… until frequency drops or no more segments

Example:

  • Text: “low lower lowest”
  • Initial tokens: [l, o, w], [l, o, w, e, r], [l, o, w, e, s, t]
  • Merge most frequent pair: (l, o) → lo
  • New tokens: [lo, w], [lo, w, e, r], …
  • Continue merging: (lo, w) → low, (low, e) → lowe, etc.

Purpose: Handles rare/unseen words better; enables subword segmentation.


N-Gram Modeling

Predict the next word based on the n-1 previous words.

  • Unigram: P(word)
  • Bigram: P(w2 w1)
  • Trigram: P(w3 w1, w2)

Example: Corpus: “I love NLP. I love AI.”

  • Bigrams: (“I love”), (“love NLP”), (“love AI”)
  • P(“love” “I”) = 2/2 = 1
  • P(“AI” “love”) = 1/2 = 0.5

Issues: Zero probability for unseen n-grams.

Solutions:

  • Laplace Smoothing: Add 1 to all counts.
  • Interpolation: Combine uni-, bi-, and tri-gram models linearly.
  • Backoff: If trigram not seen, use bigram, then unigram.

Evaluation Metrics

  • Log-Likelihood: Measures how likely a model generated a sequence. Avoids underflow by summing log probabilities.
    LL = Σ log P(w_i | w_{i-1})
    
  • Cross Entropy:
    H(p) = -1/N Σ log_2 P(w_i)
    

    Lower = better

  • Perplexity:
    PP = 2^H(p)
    
  • Lower perplexity = better; interpreted as the average number of choices per word

Example: Suppose cross-entropy H(p) = 1.5 → PP = 2^1.5 ≈ 2.828 → model has ~2.8 likely choices per word


Decoding Strategies

  • Greedy: Always pick the highest probability word
  • Beam Search: Explore top n paths and keep the best long-term one
  • Sampling: Randomly pick a word based on probabilities
  • Top-k Sampling: Randomly choose from top-k most likely words

Neural Networks

See Data Mining section. Same architecture and training applies.


Zipf’s Law

The frequency of a word is inversely proportional to its rank:

  • Rank 1: “the” → appears 22000 times
  • Rank 2: “of” → appears 11000 times

Word Embeddings

Embeddings are dense vector representations of words, often trained via:

  • Word2Vec
    • CBOW: Predict target from context
    • Skip-gram: Predict context from target

Distributional Representation:

  • Based on co-occurrence — similar words share contexts
  • E.g., “cat” and “dog” occur with “fur”, “pet”, “tail” → similar vectors

Distributed Representation:

  • Encodes concepts on separate dimensions
  • E.g., “bank” → [finance_dim=1, river_dim=0.3]

Count-Based Embeddings

Build word-context matrix using frequency statistics

TF-IDF (Term Frequency-Inverse Document Frequency)

TF = (count of term in doc) / (total terms in doc)
IDF = log(N / df)
TF-IDF = TF * IDF

Example:

  • Term “deep” appears 3 times in a doc of 100 words
  • Appears in 5 of 1000 docs
    TF = 3 / 100 = 0.03
    IDF = log(1000 / 5) = log(200) ≈ 2.30
    TF-IDF = 0.03 * 2.30 ≈ 0.069
    

PMI (Pointwise Mutual Information)

PMI(w, c) = log( P(w, c) / (P(w) * P(c)) )

Example:

  • P(w, c) = 0.001; P(w) = 0.01; P(c) = 0.005
    PMI = log(0.001 / (0.01 * 0.005)) = log(0.001 / 0.00005) = log(20) ≈ 1.30
    
  • TF-IDF: good for document classification/search
  • PMI: useful for word similarity

Word-Context Matrix

A co-occurrence matrix where:

  • Rows = words
  • Columns = context words (e.g., 5 words before/after)
  • Values = count or normalized metric (PMI, TF-IDF)

Handling Long-Distance Dependencies

Problem: Words can depend on faraway tokens

Solution: Recurrent Neural Networks (RNN)

  • Include feedback loop to maintain internal state
  • Struggles with long-term dependencies due to vanishing gradients

LSTM (Long Short-Term Memory):

  • Introduces memory cells and gates (input, forget, output)
  • Maintains long-term context using additive updates

CS 485 - Robotics

Note: This class was actually so confusing, so I don’t have many detailed notes.


Rigid Body Transformations

Rigid body transformations involve orientation and position. These include translations and rotations, which are typically represented as matrix transformations.

Example of matrix representation [R|t] (2x3):

| R11 R12 t1 |
| R21 R22 t2 |

Where R is a 2x2 rotation matrix and t is the translation vector.

Pose = Position + Orientation


Global Warping

Applies a fixed set of operations (e.g., rotate, scale) to every point in space, usually represented as matrix operations.

2x2 Matrix Examples:

  • Scaling:
    | sx  0 |
    | 0  sy |
    
  • Rotation (angle θ):
    | cos(θ) -sin(θ) |
    | sin(θ)  cos(θ) |
    
  • Reflection (over y-axis):
    | -1  0 |
    |  0  1 |
    
  • Shear (x-direction):
    | 1  shx |
    | 0   1  |
    

Translation in Homogeneous Coordinates

Translation requires an extra coordinate (homogeneous coordinate).

3x3 Translation Matrix:

| 1  0  tx |
| 0  1  ty |
| 0  0   1 |

A transformation using homogeneous coordinates is considered affine if the last row is [0, 0, 1].

A homography is a projective transformation where the last row is [e, f, 1].


Layers of Transformation

Type DOF Preserves
Translation 2 Straight lines, parallelism, angles, lengths, orientation
Rigid/Euclidean 3 Straight lines, parallelism, angles, lengths
Similarity 4 Straight lines, parallelism, angles
Affine 5 Straight lines, parallelism
Projective 6 Straight lines

Matrix Representations:

  • Translation:
    | 1  0  tx |
    | 0  1  ty |
    | 0  0   1 |
    
  • Rigid:
    | cos(θ) -sin(θ) tx |
    | sin(θ)  cos(θ) ty |
    |   0       0     1 |
    
  • Similarity:
    | s*cos(θ) -s*sin(θ) tx |
    | s*sin(θ)  s*cos(θ) ty |
    |     0         0     1 |
    
  • Affine:
    | a  b  tx |
    | c  d  ty |
    | 0  0   1 |
    
  • Projective:
    | a  b  tx |
    | c  d  ty |
    | e  f   1 |
    

Inverse Kinematics

You know the end-effector’s position and want to determine the joint angles or movements to get there.

  • Solving triangle geometry backwards.
  • Adds complexity in 3D due to orientation.

TF (Transform Library in ROS): Handles frame tracking and transformations in ROS.


Motion Control

Turns high-level goals into actuator commands.

  • Controller sends commands to actuators to affect system state.

Types:

  • Open-loop: No feedback from environment (no sensors).
  • Closed-loop: Feedback from sensors is used to update actions.

PID Controller (Proportional-Integral-Derivative)

A closed-loop system that combines three components:

  • Proportional (P): Based on current error
  • Integral (I): Accumulated past error
  • Derivative (D): Rate of change of error

PID Equation:

u(t) = K_p * e(t) + K_i * ∫e(t)dt + K_d * de(t)/dt

Where:

  • u(t) is the control output
  • e(t) is the error at time t
  • K_p, K_i, K_d are tuning constants

Example:

  • Desired position = 10 units
  • Current = 7 units → error = 3
  • Proportional term = Kp * 3
  • Integral term = Ki * total accumulated error
  • Derivative = Kd * (change in error)

Motion Planning

Motion control alone doesn’t work in complex environments.

ROS Global Planner:

  • Uses A* or Dijkstra’s algorithm on a global costmap
  • Creates a global path with waypoints

ROS Local Planner:

  • Uses local costmap and dynamic window approach
  • Computes small, feasible trajectories around obstacles

Combine both: Global planner gives long-term direction; local planner handles short-term pathing and obstacle avoidance.

Maze Solving Algorithms:

  • BFS
  • DFS
  • Heuristic Informed (A*)
  • Optimal Informed (Dijkstra’s)
  • Iterative Deepening

Vertical Cell Decomposition (VCD):

  • Split environment vertically between obstacles
  • Build connections between accessible vertical strips

Example of VCD:

  • Obstacles represented as polygons
  • Extend vertical lines up/down from each vertex
  • Connect open areas → build graph

Other Planning Tools:

  • Visibility Graph: Straightest possible path, but may lead to sharp turns
  • Voronoi Diagram: Equidistant from obstacles → safer but longer path

Search Strategies:

  • Greedy Best-First: Always choose the node closest to the goal (not always optimal)
  • Uniform Cost Search (UCS): Expands path with lowest cost (same as BFS if all costs equal)
  • A*: Combines UCS and Greedy using cost + heuristic

Filtering (State Estimation)

Assumptions:

  • Markov: Only the previous state matters
  • Stationary process: Observation probabilities are constant
  • Prior state: Initial state is known or estimated

Types:

  • Discrete: Hidden Markov Models (HMM)
  • Continuous: Kalman Filter (linear + Gaussian)
  • Non-Gaussian: Particle Filter (samples possible states)

Joint Probability Distribution:

P(x_1, x_2, ..., x_t, z_1, z_2, ..., z_t) = P(x_1) * Π P(x_t | x_{t-1}) * Π P(z_t | x_t)

Prediction Step:

P(x_t | z_{1:t-1}) = ∫ P(x_t | x_{t-1}) * P(x_{t-1} | z_{1:t-1}) dx_{t-1}

Update Step:

P(x_t | z_{1:t}) ∝ P(z_t | x_t) * P(x_t | z_{1:t-1})

CS 367 - Systems Programming

Bitwise Operations

Everything in computers is in bits (1s and 0s), which are easy to store due to their binary nature (on/off).

Bitwise operations are performed on individual bits:

  • Shift operations
  • Bit masking
  • Logical operations (AND, OR, NOT, XOR)

Logical Shift: Used with unsigned integers. Leftmost bits are filled with 0s after a right shift.

Arithmetic Shift: Used with signed integers. Leftmost bits are filled with the sign bit (1 for negative, 0 for positive) to preserve the sign.

Example:

Logical Right Shift: 1100 0000 >> 2 = 0011 0000
Arithmetic Right Shift: 1100 0000 >> 2 = 1111 0000 (preserve sign)

Memory and Addressing

Memory is like a large array of bytes. Programs refer to data using addresses. Each process is assigned a private address space.

  • Word size: Number of bits CPU can address (e.g., 64-bit → 2^64 bytes)
  • Byte-ordering:
    • Little endian: MSB is on the right (lower address)
    • Big endian: MSB is on the left (higher address)

Example:

Value: 0x12345678
Little endian: 78 56 34 12
Big endian:    12 34 56 78
  • RISC: All operations are the same size
  • Negative values are represented using two’s complement

Integer Casting & Overflow

  • Casting up:
    • If unsigned → pad with 0
    • If signed → pad with MSB
  • Casting down:
    • Truncation (may lose data)

Overflow Detection:

  • Unsigned: Overflow if sum < operands
  • Signed: Overflow if sign changes incorrectly

Integer Multiplication via Bit Shift:

x * 8 = x << 3
x * 12 = x << 3 + x << 2

Integer Division via Bit Shift:

x / 8 = x >> 3 (only works for powers of 2)

Floating Point Representation

Notation: (-1)^s * M * 2^E

7-bit floating point layout:

  • 1 bit sign
  • 3 bit exponent
  • 3 bit fraction

Bias: bias = 2^(e-1) - 1

Example: 6.5

  • Binary: 110.1 = 1.101 x 2^2
  • Sign = 0
  • Frac = 101
  • Exp = 2 + 3 (bias) = 5 = 101
  • Final: 0 101 101

Types of Floating Point Values:

  • Normalized: 00...0 < exp < 11...1
  • Special: exp = 111 → INF/NAN
  • Denormalized: exp = 000

Denormalized Example: 3/16

  • Decimal: 0.0011 = 1.1 x 2^-3
  • Bias = 3, Exp = -3 → 0 → denormalized
  • New definitions:
    • E = 1 - bias = -2
    • M = 0.11 → Frac = 110
  • Final: 0 000 110

Special Values:

  • 1 111 000 = -INF
  • 0 111 000 = +INF
  • 0 111 001 = NaN

Floating Point Multiplication:

(-1)^s1 * m1 * 2^e1 × (-1)^s2 * m2 * 2^e2
= (-1)^(s1+s2) * (m1 × m2) * 2^(e1 + e2)

Structs & Memory Alignment

Structs are laid out in memory in declaration order, and alignment rules dictate that data types must start at offsets that are multiples of their size.

Example:

struct c {
  char c;      // offset 0 (1 byte)
  int a[2];    // offset 4 & 8 (each int = 4 bytes)
  long b;      // offset 16 (8 bytes)
};
  • Total size = 24 bytes (padded to be multiple of 8)

Optimized Layout:

struct c {
  long b;      // offset 0
  int a[2];    // offset 8 & 12
  char c;      // offset 16
};
  • Padding only at the end

Dynamic Allocation:

  • Double-word aligned (start on even address)
  • Header contains size & allocation status

Allocation Strategies:

  • First fit, next fit, best fit
  • Free block merging to reduce fragmentation

Garbage Collection:

  • Memory blocks as nodes, pointers as edges
  • Unreachable nodes are removed

Exception Handling

Control Flow: Handles unexpected events (interrupts, system calls, faults)

Types of Exceptions:

  • Interrupts: Asynchronous (e.g., I/O devices)
  • Traps: System calls (e.g., read, fork, exec)
  • Faults: Illegal memory access (e.g., segmentation fault)
  • Aborts: Unrecoverable hardware errors

Microkernel vs Monolithic Kernel:

  • Microkernel: Minimal OS functionality (e.g., Windows, macOS)
  • Monolithic: Large access, customizable, risky (e.g., Linux)

Context Switching: Kernel maintains:

  • Page table, file table, process table, registers, stack

Process Lifecycle:

  • fork() → Creates child process
  • exit() → Terminates process
  • wait() or waitpid() → Parent reaps child

Zombie State: Child terminated but not yet reaped

Signals: Notify process of events

  • Used for interprocess communication
  • Propagated to process groups (e.g., pipelines)

sigaction Example:

void handler(int sig) {
    printf("Caught signal %d\n", sig);
}

struct sigaction sa;
sa.sa_handler = handler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
sigaction(SIGINT, &sa, NULL);

Race Condition Prevention: Block signals before forking, unblock afterward


Virtual Memory

Virtual memory gives each process an isolated address space. Pages are swapped in/out of disk.

  • Page fault = page not in memory → OS brings it in
  • Page Table maps virtual pages to physical frames
  • TLB (Translation Lookaside Buffer): Caches page table entries

Translation:

  • Virtual Address → VPN + Offset
  • TLB → Index + Offset → PPN

Full Virtual Address Translation Example

Simulates how a virtual address is translated to a physical address, and then used to access the cache, TLB, page table, and finally main memory.


System Configuration

Parameter Value
Virtual Address Size 16 bits
Physical Address Size 16 bits
Page Size 4 KB = 2¹² bytes
Offset Bits 12 bits
VPN Bits 4 bits
TLB Entries 4
Cache Lines 4
Cache Block Size 1 word (1 line)

Example Virtual Address

We’re translating:

Virtual Address (VA) = 0x1234
Binary: 0001 0010 0011 0100
  • VPN (Virtual Page Number) = first 4 bits = 0001 = 0x1
  • Page Offset = last 12 bits = 0x234

Step 1: TLB Lookup

TLB Table (before lookup):

Index VPN PPN Valid
0 0x0 0x3 1
1 0x2 0x4 1
2 0x3 0x6 1
3 0x4 0x2 1
  • Lookup VPN = 0x1
  • Not found → TLB Miss

Step 2: Page Table Lookup

Page Table:

VPN (Index) PPN Valid
0x0 0x3 1
0x1 0xA 1 ✅
0x2 0x4 1
0x3 0x6 1
  • VPN = 0x1 → PPN = 0xA
  • OS loads TLB entry: (VPN 0x1 → PPN 0xA)

Step 3: Construct Physical Address

Combine:

  • PPN (Physical Page Number) = 0xA
  • Offset = 0x234
Physical Address (PA) = (PPN << 12) | Offset
                      = (0xA << 12) | 0x234
                      = 0xA000 | 0x234
                      = 0xA234

Step 4: Cache Lookup

Let’s assume a direct-mapped cache with 4 lines:

  • Cache index bits: 2 bits → 2^2 = 4 lines
  • Offset: 0 bits (1-word cache block)

Break PA (0xA234):

  • Binary: 1010 0010 0011 0100
  • Tag: upper 14 bits = 101000100011 (0xA23)
  • Index: bits 2–3 = 01 → Line 1
  • Offset: 0 bits → single word

Cache Table:

Line Valid Tag Data
0 1 0xA20 0xFFFF
1 1 0xA21 0x1234
2 0
3 1 0xB10 0xF00D
  • Index = 1, Tag in PA = 0xA23 ≠ Tag in cache = 0xA21Cache Miss

Step 5: Access Main Memory

  • Cache miss → Load data from physical memory at 0xA234
  • Cache line 1 is updated with tag 0xA23 and value fetched

Summary Flow

Virtual Address = 0x1234
 ├──> TLB Lookup
 │     └── MISS
 │
 ├──> Page Table Lookup
 │     └── VPN 0x1 → PPN 0xA
 │
 ├──> Construct PA = 0xA234
 │
 ├──> Cache Lookup @ Index 1
 │     └── MISS (tag mismatch)
 │
 └──> Load from Physical Memory[0xA234] → Update Cache

cache has the physical addressses stored. —

Assembly Language

Compiled code translates to assembly, which consists of simple operations and direct memory access.

Operand Types:

  • $ = constant
  • % = register
  • 0xADDR = memory address

Example:

movq %rax, %rbx    # copy rax to rbx

Registers:

  • 8-bit: %al, %bl, %cl, %dl
  • 16-bit: %ax, %bx, %cx, %dx
  • 32-bit: %eax, %ebx, %ecx, %edx
  • 64-bit: %rax, %rbx, %rcx, %rdx

Jump Example:

start:
  cmp $1, %eax
  je equal
  jmp end

equal:
  mov $1, %ebx
end:

Caller-saved: Functions don’t restore them Callee-saved: Must be preserved by callee

C Function vs Assembly:

int add(int a, int b) {
  return a + b;
}
add:
  mov %rdi, %rax   # copy a to return register
  add %rsi, %rax   # add b
  ret

Memory Locality and Hierarchy

Temporal Locality: Reuse of recently accessed data Spatial Locality: Accessing nearby data (e.g., arrays)

Hierarchy:

  1. Register – Fastest
  2. Cache – L1/L2
  3. Main Memory – DRAM
  4. Disk – Persistent storage

Static RAM: Cache, fast but costly Dynamic RAM: Main memory, dense

Performance Metrics:

  • Transfer latency
  • Rotational latency
  • Seek time

Cache Example

Address broken into segments:

  • Tag: Uniquely identifies block
  • Line: Row in cache
  • Offset: Byte within block

Example Table:

Address: 0x1234AB
Tag:     0x12
Line:    0x34
Offset:  0xAB

Formulas:

  • 2^tag blocks per set
  • 2^line sets
  • 2^offset bytes per block

CS 471 - Operating Systems

Note: Half of OS is a repeat of CS 367 because Computer Engineering students also take this course.


OS Refresher

  • Von Neumann architecture
  • Virtualization (Virtual Memory)
  • Provides system calls as standard library
  • Resource manager
  • Multithreading
  • Volatile vs Non-volatile memory
  • Abstractions
  • Provides protections
  • Isolates processes

Processes

  • Program: Code stored on disk
  • Process: When the code is loaded and running

States:

  • Running
  • Ready
  • Blocked
  • Terminated

Concurrency

  • Each thread has its own set of registers
  • Context switch with threads and processes works very similarly
  • State of each process → moved to Process Control Block (PCB)
  • State of each thread → moved to Thread Control Block (TCB)
  • Thread context switches do not usually involve paging (same address space)
  • Global variables must be protected to ensure mutual exclusion
  • Race conditions can lead to nondeterministic behavior

Direct Execution

  1. Create entry on process list
  2. Allocate memory for the program
  3. Load program into memory
  4. Set up stack and clear registers
  5. Run main
  6. Return from main
  7. Free memory
  8. Remove from process list

Problem: No time-sharing


Limited Direct Execution

  • Process runs in user mode (can access program files, registers)
  • For I/O: switch to kernel mode (has unrestricted access)

Trap Instruction → switch to kernel
Return from Trap → switch back to user mode

Execution Flow:

  1. Create entry on process list
  2. Allocate memory for the program
  3. Load program into memory
  4. Set up stack and clear registers
  5. Run main
  6. System call (trap)
  7. Save registers to stack
  8. Kernel mode
  9. Handle trap
  10. Return to user mode
  11. Restore registers
  12. Return from main
  13. Free memory
  14. Remove from process list

When to Context Switch

  • Option 1 (Cooperative): Switch when the process makes a system call
    • Problem: If no system calls, a program may hog the CPU
  • Option 2 (Non-cooperative): Timer interrupts trigger a switch

Two Saves:

  1. Timer interrupt → Hardware saves user register state using kernel stack
  2. OS switches process → Kernel register state saved by OS

CPU Scheduling

Assumptions:

  • Jobs run for a certain time
  • All arrive at the same time
  • Once started, runs to completion
  • No I/O
  • Runtime is known

Scheduling Algorithms:

  • FCFS: Simple, but short tasks wait for long ones
  • Shortest Job First: Optimal if jobs known in advance
  • Shortest Time to Completion: Preemptive version of SJF, allows shorter tasks to interrupt
  • Round Robin: Fixed time slice, good for time-sharing but can delay long tasks
  • With I/O: While one task does I/O, another can run

Real World: OS doesn’t know how long each task takes


Multilevel Feedback Queue (MLFQ)

  • Multiple priority queues
  • Jobs move up/down levels
  • Prevents starvation (but can still happen with many high-priority tasks)

Proportional Share Scheduling:

  • Based on CPU usage
  • Ensures all processes get some CPU

Lottery Scheduling:

  • Tasks receive a number of tickets
  • Random winner per scheduling round
  • Prevents starvation using randomness
  • Example: Task A gets tickets 1-10, Task B gets 11-20

Linux Scheduler:

  • Completely Fair Scheduler (CFS)
  • Priority-based
  • Tracks virtual runtime
  • Uses Red-Black Tree to manage scheduling efficiently

Multiprocessor Scheduling:

  • Each CPU has its own cache
  • Requires hardware support

Memory: Base and Bounds (Dynamic Relocation)

  • Two registers: Base and Bounds
  • Physical Address = Virtual Address + Base
  • Bounds = upper memory limit
  • Prevents processes from accessing memory outside their segment → segfault
  • Base and bounds updated on each context switch

Livelock

  • Two processes constantly changing state without progress
  • Solution: Interrupts / busy-wait handling

Data Transfer

  • Programmed I/O: CPU actively manages data transfer
  • DMA (Direct Memory Access): CPU writes data to memory, DMA controller handles transfer

Disk Performance

Latency = Seek Time + Rotational Latency + Transfer Time

Disk Scheduling:

  • FIFO: Worst performance
  • Shortest Positioning Time First (SPTF): Fast, but risks starvation
  • SCAN: Sweep from one end, services along the way (middle gets priority)
  • C-SCAN: Sweep one direction only, resets without servicing on return

RAID (Redundant Array of Inexpensive Disks)

Appears as one disk to OS, provides performance, capacity, and reliability.

  • RAID-0 (Striping):
    • No redundancy
    • Capacity = N
    • No fault tolerance
  • RAID-1 (Mirroring):
    • Store exact copy
    • Capacity = N/2
    • Fault-tolerant
  • RAID-4 (Parity):
    • One disk holds parity info
    • Capacity = N - 1
    • Higher latency for random writes
  • RAID-5 (Rotating Parity):
    • Parity spread across disks
    • Reduces write latency

CS 475 - Concurrent and Distributed Systems


Overview

  • Concurrent: Multiple tasks on the same computer, using scheduling to switch between them.
  • Parallelism: Multiple CPUs running tasks truly simultaneously.
  • Distributed: Multiple computers sharing a task over a network.

Multithreading

  • Each thread has its own stack within the overall stack and its own control flow.
  • Shared memory (like global variables) → risk of race conditions.
  • Threading is less expensive (doesn’t require its own page table).

Race Condition Example: Thread 1 and Thread 2 both read value 5. Each subtracts 1. Both write 4 back → expected result was 3.

Solution: Use locks to restrict access to shared memory.


Locks and Synchronization

  • Atomic Instruction: Retrieve and write in one go.
  • Basic Lock Functions:
    • acquire()
    • release()

C-style Mutex Locks

Ways to implement locks:

  1. Disable Interrupts:
    • Only viable for kernel code.
    • Not safe (can be abused).
  2. Peterson’s Algorithm:
    • Works for 2 threads.
    • Threads alternate.
    • Lots of waiting.
  3. Test and Set (Spinlock):
    • Spin until the lock becomes available.
    • Problem: still has race conditions.
  4. Atomic Test and Set:
    • Atomic check + acquire.
    • Disable interrupts only when acquiring.
  5. Compare and Swap:
    • If lock is 0 → set to 1, else wait.
    • Atomic by hardware.

Condition Variables

Used to control the order of execution among threads.

Bounded Buffer Problem:

  • Producer only adds if space is available.
  • Consumer only removes if buffer not empty.

Solution:

  • Use wait() and signal().
  • Always put wait() in a while loop to avoid spurious wakeups and race conditions.

Semaphores

  • Work as both locks and condition variables.
  • Binary or counting semaphores.
sem_t s;
sem_init(&s, 0, 1); // binary semaphore
sem_wait(&s); // acquire
sem_post(&s); // release

Concurrent Data Structures

  • Concurrent Queue: One lock for the entire queue.
  • Concurrent Hashtable: One lock per chain (if chaining).
  • Concurrent Linked List: Two locks for adjacent nodes.

Distributed Systems

Architecture

  • RPC (Remote Procedure Call)
  • MOM (Message-Oriented Middleware)

Message Invocation Models

  • At least once – e.g., SunRPC
  • Exactly once – like a local function
  • At most once – e.g., Go/Java

Idempotency

  • A function that returns the same result even when executed multiple times.

MapReduce

  • Input/output (I/O) is the bottleneck
  • Automates parallelization and distribution across machines
  • Input: (key, value)
  • map() → intermediate pairs
  • reduce() → merges all intermediate pairs

Time Synchronization

  • Ensure that requests happen in correct order

Techniques:

  • NTP (Network Time Protocol): syncs to UTC via satellite/radio
  • Christian’s Algorithm: uses round-trip time to sync clocks
  • Lamport Clocks: happened-before logic using counters
  • Totally Ordered Multicast:
    • Broadcast update
    • Receive ACKs
    • Apply only when everyone has received

Vector Clocks

  • Vector of integers (one per process)
  • Captures causality

Rules:

  • Increment your own clock
  • Send your vector with messages
  • Upon receiving, merge vectors

System Models

  • Synchronous: Known message delay, global clock
  • Asynchronous: No timing assumptions

Failure Models

  • Crash-Fail: Deterministic, node crashes
  • Partition: Node split/network disconnection
  • Byzantine: Arbitrary failures (even malicious)

Distributed Mutual Exclusion

Assumptions:

  • Reliable network
  • Asynchronous processes
  • Crash-fail only

Approaches:

  • Centralized Lock Server:
    • All requests go to a single server
    • Server grants access
    • Problem: single point of failure
  • Ring-based Algorithm:
    • Token is passed
    • No central server
    • Problem: ring breakage
  • Shared Priority Queue (Lamport Queues):
    • Timestamps
    • Floods messages → overhead
  • Ricart & Agrawala:
    • Combines request and release
    • Optimizes Lamport’s idea

Distributed Transactions

  • Transactions must be atomic (all or nothing)
  • ACID properties

Two-phase locking:

  1. Acquire all locks
  2. Release all locks

Coordinator:

  • One node starts transaction
  • Decides to commit/abort

Two-Phase Commit (2PC)

  1. Voting Phase:
    • All nodes vote commit or abort
    • Once voted commit, cannot revert
  2. Commit Phase:
    • Actually apply changes

Failure Handling:

  • Participant crash → Ask coordinator for result
  • No response → Default to abort
  • Coordinator crash → Elect a new one or use previously sent result

Three-Phase Commit (3PC)

Adds a step to improve liveness:

  1. Vote
  2. Pre-commit broadcast
  3. Commit notice

Helps if coordinator crashes → everyone knows the decision.

Problem:

  • If half receive, half don’t → partition
  • Majority continues, minority syncs later

CAP Theorem

You can have only two out of:

  • Consistency
  • Availability
  • Partition Tolerance

Partitioning is non-negotiable in distributed systems.

ZooKeeper : Coordinator system used for leader election, configuration, and locking in distributed systems.

CS 483 - Analysis of Algorithms

Note: For this class — just do a bunch of Leetcode to build intuition. Then, learn how to write proofs. That’s it. This is a very applied class; you won’t get much out of just reading theory.


Representative Problems

  1. Interval Scheduling Problem
    • Solved using greedy algorithms
  2. Weighted Interval Scheduling
    • Similar to interval scheduling but with priorities
    • Solved using dynamic programming
  3. Bipartite Matching
    • Pairings and maximum matchings in bipartite graphs
  4. Network Flow
    • Flow problems on graphs (e.g., Ford-Fulkerson)
  5. NP-Complete Problems
    • Hard to solve efficiently
    • Easy to verify solutions
    • Example: Competitive Facility Location
      • Find optimal spacing among facilities

Complexity Classes

  • NP-Complete: Solutions are easy to verify but potentially hard to compute
  • P-Space Complete: Hard to find a solution but easy to check once found

Algorithm Paradigms

Divide and Conquer

  • Break problem into subproblems
  • Solve recursively
  • Combine solutions

Greedy Algorithms

  • Construct solution step-by-step
  • Always choose the best decision at the current step
  • Hope that local optimum → global optimum

Dynamic Programming

  • Consider all possible solutions, choose the best
  • Look for repeated subproblems
  • Two main approaches:
    • Memoization: Top-down + recursion + caching
    • Tabulation: Bottom-up + iterative + often array-based

When to use DP?

  • Recursive solutions with repeated calculations
  • Can transform recursion → memoization → tabulation

Graph Algorithms

  • BFS (Breadth-First Search)
  • DFS (Depth-First Search)
  • Dijkstra’s Algorithm (shortest path)
  • Topological Sort (for DAGs)
  • Minimum Spanning Tree (MST)
    • Prim’s Algorithm
    • Kruskal’s Algorithm

Proof Techniques

  • Lemma + Corollary + Subproblem + Algorithm format
  • Inductive Reasoning:
    • Base case
    • Inductive step (assume n)
    • Prove for n+1

CS 480 - Introduction to Artificial Intelligence

A combination of search, logic, learning, and reasoning. Examples and proofs help more than pure theory memorization.

Constraint Satisfaction Problems (CSPs)

  • Identification problem: We care if a state is a goal state, not how we got there.
  • Components:
    • Variables: Set of objects that will take on values
    • Domain: Set of all possible values for each variable
    • Constraints: Restrictions on variable values with respect to other variables
  • Complexity: Usually NP-hard
  • Typical solver: Backtracking

Arc Consistency:

  • Represent each undirected edge as two directed arcs
  • Maintain arcs such that for every value in variable A, there’s a consistent value in variable B
  • Can be thought of as an adjacency list

k-Consistency:

  • Extends arc consistency to sets of k variables

Forward Checking:

  • Eliminate inconsistent values ahead of time to reduce backtracking

Heuristics:

  • Minimum Remaining Value (MRV): Choose the variable with the fewest legal values
  • Least Constraining Value (LCV): Choose the value that eliminates the fewest choices for the neighboring variables

Tree-structured CSP:

  • Can be solved efficiently using topological sort

Local Search for CSPs:

  • Start from a random assignment
  • Iteratively reassign values to minimize conflicts
  • Constant space, but incomplete and suboptimal as the constraint-to-variable ratio increases
Example: Map Coloring
  • Variables: {WA, NT, SA, Q, NSW, V, T}
  • Domain: {Red, Green, Blue}
  • Constraint: Adjacent regions must not share the same color
Example: N-Queens (4x4)
  • Queens: Q1–Q4 each in different columns
  • Place Q1 in row 2, Q2 in row 4, Q3 in row 1, Q4 in row 3 → No queen attacks another

Peak Finding

Hill Climbing:

  • Always move towards increasing value
  • May get stuck at local maxima
  • Incomplete

Simulated Annealing:

  • Occasionally allows downhill moves to escape local maxima
  • Balances exploration and exploitation

Genetic Algorithm:

  • Local beam search with crossover and mutation
  • Explore multiple paths in parallel and favor the most promising

Search Problems

  • World State: All information about the environment
  • Search State: Minimal info needed for search algorithm
Algorithm Complete Optimal Notes
Depth-First Search No No Can get stuck in loops
Breadth-First Search Yes Yes If all edges have equal cost
Uniform Cost Search Yes Yes Explores lowest cost path first
Iterative Deepening Yes Yes Combines BFS + DFS
Bidirectional Search Yes Yes If both searches use UCS
Algorithm Complete Optimal Notes
Greedy Search No No May get stuck in loops
A* Search Yes Yes Requires admissible and consistent heuristic
  • Good heuristic: Always underestimates cost to goal

Minimax — Step-by-Step Example

            MAX
         /       \
      MIN         MIN
     /   \       /   \
    3     5     2     9
  1. Evaluate the left MIN:
    • Children: 3, 5 → MIN = 3
  2. Evaluate the right MIN:
    • Children: 2, 9 → MIN = 2
  3. MAX chooses the maximum of 3 and 2 → Best move = Left Subtree → Value = 3

Alpha-Beta Pruning Example (using above tree)

            MAX
         /       \
      MIN         MIN
     /   \       /   \
    3     5     2     9
  • Alpha = -inf, Beta = inf
  • Left MIN: 3 (Beta = 3), then 5 → no pruning
  • Right MIN: 2 (Beta = 2) < Alpha (3) → PRUNE 9

Expectimax

  • MIN nodes replaced with average of possible outcomes
         MAX
       /     \
    Chance   Chance
    /   \     /   \
   3    5   2     9
  • Chance 1: (3+5)/2 = 4
  • Chance 2: (2+9)/2 = 5.5
  • MAX picks 5.5 → right branch

Sampling Methods in Bayes Nets

Bayes Net Example:

Cloudy → Rain
     ↘        ↓
     Sprinkler → WetGrass
  • P(Cloudy) = 0.5
  • P(Rain Cloudy) = 0.8
  • P(Sprinkler Cloudy) = 0.1
  • P(WetGrass Rain, Sprinkler) = 0.99

Prior Sampling

  • Generate samples from root → leaves
  • Sample: Cloudy = True → Rain = True → Sprinkler = False → WetGrass = True
  • Repeat for 1000 samples, estimate P(WetGrass=True)

Rejection Sampling

  • Generate prior samples, discard those not matching evidence (e.g., WetGrass=True)

Likelihood Weighting

  • Fix evidence (e.g., WetGrass=True)
  • Weight each sample based on probability of evidence given parents

Gibbs Sampling

  • Initialize all variables randomly
  • Repeatedly sample each non-evidence variable conditioned on Markov blanket

CS 455 - Networking

Internet Overview

  • The Internet is a global computer network interconnecting billions of users.
  • End systems: Devices like computers and servers.
  • ISPs (Internet Service Providers): Provide access to the internet.
    • Local ISP → Regional ISP → Tier 1 ISP

Packet Transmission

  • Data sent as packets.
  • Packets travel through links (fiber optics, coaxial cable, wireless) and packet switches.

Types of Network Delay

  1. Processing Delay: Time to examine packet header and check for errors.
  2. Queueing Delay: Time waiting in buffer before transmission.
  3. Transmission Delay: Time to push bits onto the link.
  4. Propagation Delay: Time to travel across the link.

Throughput = Your share of available bandwidth.


TCP/IP Model

Layer Protocols Notes
Application HTTP, SMTP, FTP, DNS End-to-end communication
Transport TCP, UDP End-to-end delivery. TCP: reliable, UDP: fast
Network IP Routing and forwarding between hosts
Link Ethernet, Wi-Fi Transmission between nodes
Physical Coaxial, Fiber Optic Signal transmission medium

Encapsulation:

[Link Header [Network Header [Transport Header [Application Header [Data]]]]]

Network Security

  • DoS Attack: Overwhelms target with traffic
  • Packet Sniffing: Monitors data
  • IP Spoofing: Fake IP addresses
  • Mitigation: Endpoint authentication, encryption

Application Layer

Models

  • Client-Server
  • Peer-to-Peer (P2P)

Socket

  • Interface for process-to-process communication

Key Considerations:

  • Reliability (data guaranteed?)
  • Throughput (rate of data)
  • Loss Tolerance (okay to drop packets?)
  • Timing (real-time required?)
  • Security (data integrity/authentication?)

DNS (Domain Name System)

  • Maps domain names to IP addresses
  • Uses UDP

DNS Hierarchy:

Client → Local DNS Resolver
         |
         ↓
     [Root Server]
         |
         ↓
     [TLD Server]
         |
         ↓
[Authoritative Server]
         |
         ↓
     Final IP Address

Search Types:

  • Iterative: Resolver walks through each level
  • Recursive: Resolver asks root, which queries each level

DNS Record Types:

  • A → Hostname to IP
  • NS → Domain to name server
  • CNAME → Alias to canonical
  • MX → Mail exchange

HTTP

  • Runs on TCP, stateless (nonpersistent connection)
  • Cookies: Track sessions
  • Web Caches: Reduce latency and load

SMTP

  • Transfers mail over TCP
  • User Agent → Mail Server → Recipient Server
  • Display requires POP3, IMAP, or HTTP

Transport Layer

Reliable Data Transfer

  • Checksum: Detect bit errors
  • ACK/NACK: Confirm or reject packets
  • Sequence Numbers: Ensure correct ordering
  • Timeouts: Retransmit if no ACK received

Pipelining

  • Go-Back-N (GBN):
    • Sliding window
    • Cumulative ACK
    • One timer per window
Sender Window Size = 4
Packets Sent: [0, 1, 2, 3] → Awaiting ACKs

ACK for 0 received → Slide window to [1, 2, 3, 4]
ACKs for 1 and 2 lost
ACK for 3 received → Discarded (cumulative ACK not reached)

Timeout for Packet 1 → Resend: 1, 2, 3, 4
  • Selective Repeat:
    • Individual ACKs
    • Timer per packet
Sender Window Size = 4
Packets Sent: [0, 1, 2, 3]
ACKs received for 0 and 2
Timer expires only for packet 1 → Resend packet 1

Only missing packet is resent → Bandwidth-efficient

UDP

  • Lightweight, best-effort delivery, optional checksum

TCP

  • Reliable byte-stream service
  • Combines Go-Back-N and Selective Repeat
  • ACK = next expected byte
  • Fast retransmit: After 3 duplicate ACKs
  • Timeouts: Exponential weighted moving average
Sequence: [1000, 1001, 1002]
ACKs: ACK=1003 (means packets up to 1002 received)

Fast Retransmit:
Duplicate ACKs for 1001 ×3 → Immediate resend of 1001

TCP Flow Example:

Client                         Server
  | SYN ---------------------> |
  |                           |
  | <------------------ SYN-ACK |
  | ACK ---------------------> |

Congestion Control:

  • Additive Increase, Multiplicative Decrease
  • Slow Start: Exponential ramp-up
  • Congestion Avoidance: Linear increase post-threshold
CWND (Congestion Window Size)
^
|        /\
|       /  \
|      /    \
|_____/      \_______
       Additive     Multiplicative
       Increase       Decrease

Network Layer

IP Model

  • Best-effort: No delivery guarantees

Planes

  • Data Plane: Packet forwarding
  • Control Plane: Path computation (routing)

DHCP (Dynamic Host Configuration Protocol)

  • Assigns IP addresses dynamically
  • Runs over UDP

DHCP Message Exchange:

  1. Discovery (Host → 255.255.255.255)
  2. Offer (Server)
  3. Request (Host)
  4. ACK (Server)

NAT (Network Address Translation)

  • Private IP → Public IP + Port
  • Reduces global IP usage

VPN

  • Adds encrypted headers (tunneling)

IPv6 Transition

  • Uses tunneling to wrap IPv6 in IPv4 headers

Devices

| Device | Function | |———-|——————————| | Router | Forwards packets (network) | | Switch | Forwards frames (link) | | Firewall | Permits or denies traffic | | NAT | IP translation for networks |


Routing Protocols

  • Link-State:
    • Flood costs to all
    • Compute shortest paths (e.g., Dijkstra)
  • Distance Vector:
    • Bellman-Ford algorithm
    • Slower convergence

Routing Domains

  • BGP: Inter-AS (between networks)
  • OSPF: Intra-AS (within networks)

Forwarding

  • Uses forwarding tables
  • Matches IP address prefix to interface
| Prefix        | Interface |
|---------------|-----------|
| 192.168.1.0/24| eth0      |
| 10.0.0.0/8    | eth1      |
| 0.0.0.0/0     | eth2      | ← Default route

Packet Dest: 192.168.1.34 → eth0

  • Transmits between adjacent nodes using MAC addresses

ARP (Address Resolution Protocol)

  • IP → MAC address
  • Uses broadcast to discover addresses

Channel Access Methods

Protocol Description
FDMA Divide channel by frequency
TDMA Divide channel by time
CDMA Assign unique codes to users
Slotted ALOHA Time slots; retry on collision
Pure ALOHA No time slots; even higher collision rate
CSMA Sense channel before sending
CSMA/CD Detect collision and stop transmission

FDMA

User 1 → 0–1 MHz
User 2 → 1–2 MHz
User 3 → 2–3 MHz

Each gets a frequency band.

TDMA

Time Slot Assignment:
Slot 1: User A
Slot 2: User B
Slot 3: User C

Users transmit in their time window only.

CDMA

User 1: Code 1001
User 2: Code 1100

Both send at once, decoded at receiver using code correlation.

ALOHA

User 1 sends packet → Collision with User 2
Wait random time, retransmit

Pure ALOHA: Random anytime
Slotted ALOHA: Only at slot boundaries

CSMA/CD Example

User 1 senses channel → Idle → Transmit
Collision occurs with User 2
Both stop immediately
Wait random time → Retry

Switches vs Routers

Device Layer Purpose
Switch Link Forward & flood frames
Router Network Forward packets