class notes | Suchita Hadimani

Crash Course GMU

A compilation of all the notes from all my CS classes at GMU – the ultimate revision guide.

Why did I take the time to compile this? By my final year of college, everything felt like a blur, and it seemed like I had learned nothing. Of course, after taking so many classes, I had definitely gained knowledge. However, I wanted to do one grand revision of everything I had learned, from the beginning to the very end.

My notes are neither the most organized nor fully comprehensive and should not be used as a cram study guide for final exams (you will undoubtedly fail). They are meant to offer a somewhat detailed summary of what you might learn over four years, along with random facts I found interesting. You’ll notice the quality of notes declines as the semester progresses, with a steep drop after midterm season. Alas, such is the life of a student.

Happy studying!

All Classes

Links to jump to any class you need help with:

Undergraduate Computer Science Coursework
CS 110 Essentials of Computer Science
CS 211 Object-Oriented Programming
STAT 344 Probability and Statistics
CS 262 Introduction to Low Level Programming
CS 310 Data Structures
CS 330 Formal Methods and Models
CS 484 Data Mining
CS 478 Natural Language Processing
CS 485 Autonomous Robotics
CS 367 Computer Systems Programming
CS 471 Operating Systems
CS 475 Concurrent and Distributed Systems
CS 483 Analysis of Algorithms
CS 480 Intro to Artificial Intelligence
CS 455 Networking

CS 110 - Essentials of Computer Science

The history of computer science spans several generations, with different tools and engineers marking each period:

Vacuum Tubes – Systems programmers wrote programs to convert system languages to machine languages.
Transistors – Application programmers wrote instructions for computers to execute operations.
Circuit Boards & Operating Systems – Non-programmers could start using computers for tasks.
Large-Scale Integration (Silicon Chips) – Enabled word processing and more advanced software.
Interconnected CPUs & GPUs – Widespread computer usage and the rise of the internet.

Notable Figures:

Ada Lovelace – Often considered the first female programmer. Her work built upon Babbage’s Analytical Engine.
Charles Babbage – Designed the Analytical Engine during the Industrial Revolution to automate calculations. Inspired by the Jacquard loom’s punch cards, which allowed for repeated patterns.
Alan Turing – Father of theoretical computer science. Developed the Bombe to crack Enigma codes during WWII. His Turing Machine and Turing Test remain foundational in AI research.

Key Concepts:

Binary (Base 2): 0-1 (1 bit)
Octal (Base 8): 0-7 (3 bits)
Decimal (Base 10): 0-9
Hexadecimal (Base 16): 0-F (4 bits)

Data Compression

Reducing storage space needed for data.

Lossless Compression – No data is lost during compression.
Compression Ratio – How much a file is reduced in size.
Bandwidth – The amount of data that can be transferred at a time.

Types of Encoding:

Keyword Encoding – Replaces common words with shorter representations (e.g., “and” → “&”).
Run-Length Encoding – Compresses repeated characters (e.g., 1111110001111 → 613041).
Huffman Encoding – More frequent characters get shorter bit representations.

Digital signals are preferred for data transfer because they are resistant to environmental interference.

Computer Architecture

Virtual Machines (VMs): Emulations of physical computers stored as files (“images”).
Hypervisor: Software that creates and manages VMs.
Cloud Computing: Remote data storage and computing services with concerns over security and third-party reliance.
Heap Memory: Used by all parts of a program and dynamically allocated.
Stack Memory: Used for local variables and function calls; prone to overflow.

Storage & Hardware

Solid State Drives (SSD): Fast, durable, non-mechanical.
Hard Disk Drives (HDD): Higher storage capacity, mechanical, cheaper but more fragile.
CPU (Central Processing Unit): Executes programs, includes ALU (Arithmetic Logic Unit) and processors.

CS 211 - Object-Oriented Programming

Compilation Process

.java files are converted to .class files by the Java compiler.
Java does not have an assembler and only requires a compiler to convert .java files into bytecode.
The Java Virtual Machine (JVM) generates machine code at runtime, acting as an interpreter.
Advantage: Since Java includes its own runtime environment, it can be executed on any device. This was likely one of the primary reasons Java was created.

Tokens in Java

Java has six types of tokens:

Reserved Keywords – if, else, int, for, while
Identifiers – x, i, objName, varName
Literals – 7, "hello", 8.9
Operators – +, -, /, *
Separators – [], {}, ()
Terminators – ;

Example:

int count = 7; // reserved, identifier, operator, literal, terminator

Increment Operators

int x = 9;

x++ → Evaluates x, then increments it.
- print(x++) → 9
- print(x) → 10
++x → Increments x, then evaluates it.
- print(++x) → 10
- print(x) → 10

Variable Types in Java

Class Variables: Shared across all objects of the class. Changing one instance affects all others. Declared using the static keyword.
Instance Variables: Each object has its own copy. Modifying one object’s variable does not affect others.
Local Variables: Exist only within a method. Once the method returns, the variable’s scope ends.

Primitive Data Types & Their Sizes

Integers

byte - 1 byte
short - 2 bytes
int - 4 bytes
long - 8 bytes

Floating-Point Numbers

float - 4 bytes
double - 8 bytes

Characters

char - 1 byte

Booleans

boolean - 1 byte (size may vary)

Reference Data Types

Any non-primitive type or dynamically created object.
Includes:
- Data structures (e.g., stacks, queues)
- Objects of a class

Autoboxing & Unboxing

Autoboxing: Converting a primitive to a reference type.
Unboxing: Converting a reference type to a primitive.

Type Conversion & Typecasting

Implicit Conversion

Converts to a larger data type (no data loss).
```
int x = 4;
long y = x;
```

Explicit Conversion (Typecasting)

Required when converting to a smaller type.
```
long x = 4;
int y = (int) x;
```

Object References & Comparisons

Demo d = new Demo(); // Creates a new object
Demo d1 = d;         // Assigns d1 to reference of d

d and d1 refer to the same object.
d == d1 → true

Jagged Arrays

int[][] array = new int[2][];
array[0] = new int[3];
array[1] = new int[4];

Strings in Java

Creating Strings

String s = "hey";
String t = new String("hi");

s == t returns false because their references are being compared.

Correct String Comparison

s.compareTo(t) == 0

String Immutability

Strings in Java are immutable.
Operations like += or " " + " " create a new string object.
This can cause performance issues in large applications.
Use StringBuilder for efficient string manipulation.

Java I/O

System.in
System.out
System.err

Object-Oriented Programming (OOP) Concepts

Classes & Methods

Class: A collection of fields (data) and methods (functions).
this Reference:
- Refers to the actual instance variables of a class.
- Useful in constructors to differentiate between instance variables and input arguments.

Cascading Constructors

class Person {
    Person() {
        this("hi", 2); // Calls the constructor with two parameters
    }

    Person(String name, int age) {
        this.name = name;
        this.age = age;
    }
}

Instance vs. Static Methods

Instance Methods → Require an object to be called.
Static Methods → Cannot access instance variables, but can access static variables.

Method Overloading & Overriding

Overloading: Same method name, different parameters.
Overriding: Same method name, same parameters.

Encapsulation: Access Modifiers

Inheritance

Defines relationships between parent and child classes (is-a relationship).
final class → Cannot be extended.
abstract class → Must be extended.

Polymorphism

Static Polymorphism → Method overloading.
Dynamic Polymorphism → Method overriding.

STAT 344 Probability and Statistics

Statistics measures variability. Probability measures uncertainty. No. I have no idea what that means.

Types of Models:

Mechanistic → Based on theory/principles (e.g., gas laws)
Empirical → Applied science and math
Regressional → Builds on empirical models and offers predictions
Probabilistic → Attempts to quantify model errors

Fundamental Concepts:

Random Experiment: Same input leads to different outputs.
Sample Space: Set of all possible outcomes.
- Discrete Sample Spaces → Finite, quantifiable outcomes.
- Continuous Sample Spaces → Infinite outcomes (e.g., number line).
Event: A subset of the sample space, typically denoted with a capital letter.

Axiomatic Probability:

Axiom/Postulate: A statement assumed to be true.
Axiomatic System Features:
- Consistent → No contradictions.
- Independent → Axioms cannot be derived from each other.
- Complete → Everything can be derived from axioms.

Conditional Probability:

[ P(D|F) = \frac{P(A \cap B)}{P(B)} ]
Independence of Events:

( P(A B) = P(A) )
( P(A \cap B) = P(A) \times P(B) )

Bayes’ Theorem

(Came back during NLP Sentiment Analysis)
[ P(A|B) = \frac{P(B|A) \times P(A)}{P(B)} ]

Distribution Functions

Probability distribution → Probabilities associated with specific values of ( x ).

Probability Mass Function (PMF)

Example:

( f(x_i) > 0 ) for discrete values
Cumulative Distribution Function (CDF) → Integration of PMF

Properties of the CDF

Describes a probabilistic function.
Non-decreasing.
Defined for all real numbers.
As ( x \to +\infty ), ( P = 1 ); as ( x \to -\infty ), ( P = 0 ).

Example Discrete CDF:

( P(X = x) = \frac{1}{6}, ) for ( x \in {1, 2, 3, 4, 5, 6} ).

Piecewise function:
[ F(x) = \begin{cases} 0, & x < 1
1/6, & 1 \leq x < 2
1/3, & 2 \leq x < 3
1/2, & 3 \leq x < 4
2/3, & 4 \leq x < 5
5/6, & 5 \leq x < 6
1, & x \geq 6 \end{cases} ]

Special Discrete Distribution Functions:

Discrete Uniform Distribution
- Description: Equally likely outcomes over integer range ([a, b]).
- Expectation: ( E(X) = \frac{a + b}{2} )
- Variance: ( \frac{(b-a+1)^2 - 1}{12} )
- Example: Fair die roll ((a=1, b=6), E(X) = 3.5, Var(X) \approx 2.92 ).
Binomial Distribution (Extension of Bernoulli trials with defined trials)
- Description: Number of successes in ( n ) independent trials.
- PMF: ( P(X=k) = C(n,k) p^k (1-p)^{n-k} )
- Expectation: ( np )
- Variance: ( np(1-p) )
- Example: 10 coin flips ((p=0.5), E(Heads) = 5, Var(X) = 2.5 ).
Geometric Distribution
- Description: Trials until first success.
- PMF: ( P(X=k) = (1-p)^{k-1} p )
- Expectation: ( 1/p )
- Variance: ( (1-p)/p^2 )
- Example: Drawing cards until an ace appears ((p=4/52), E(Trials) = 13, Var(X) = 156).
Negative Binomial Distribution
- Description: Trials until ( r ) successes.
- PMF: ( P(X=k) = C(k-1, r-1) p^r (1-p)^{k-r} )
- Expectation: ( r/p )
- Variance: ( r(1-p)/p^2 )
- Example: Sales calls until 3 sales ((p=0.2), E(Calls) = 15, Var(X) = 60).
Hypergeometric Distribution
- Description: Successes in draws without replacement (e.g., card games).
- PMF: ( P(X=k) = \frac{C(K,k) C(N-K,n-k)}{C(N,n)} )
- Expectation: ( n \times (K/N) )
- Variance: ( n \times (K/N) \times (1-K/N) \times \frac{N-n}{N-1} )
- Example: 7-card poker hand with 2 aces, ( E(Aces) \approx 0.54 ).
Poisson Distribution
- Description: Events in a fixed interval with known rate.
- PMF: ( P(X=k) = \frac{\lambda^k e^{-\lambda}}{k!} )
- Expectation: ( \lambda )
- Variance: ( \lambda )
- Example: Call center receives 5 calls/hour, ( E(Calls \text{ in 2 hrs}) = 10, Var(X) = 10 ).

Continuous Probability Distributions

Continuous Uniform Distribution
- Description: Similar to discrete uniform, but for continuous values.
- PDF: ( f(x) = \frac{1}{b-a}, \quad a \leq x \leq b )
- Expectation: ( (a+b)/2 )
- Variance: ( (b-a)^2 / 12 )
Normal (Gaussian) Distribution
- Description: A symmetric distribution centered around the mean.
- PDF: ( f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-(x - \mu)^2 / (2\sigma^2)} )
- Expectation: ( \mu )
- Variance: ( \sigma^2 )
Exponential Distribution
- Description: Time between events in a Poisson process.
- PDF: ( f(x) = \lambda e^{-\lambda x}, \quad x \geq 0 )
- Expectation: ( 1/\lambda )
- Variance: ( 1/\lambda^2 )
Gamma Distribution
- Description: Generalization of exponential; models time until the ( k )-th event in a Poisson process.
- PDF: ( f(x; k, \lambda) = \frac{\lambda^k x^{k-1} e^{-\lambda x}}{\Gamma(k)}, \quad x \geq 0 )
- Expectation: ( k/\lambda )
- Variance: ( k/\lambda^2 )

CS 262 - Low-Level Programming in C

PCAL: The Compilation Process

You will likely be quizzed and tested on the PCAL process:

Preprocessing: Removes whitespace, comments, expands macros, and includes header files.
Compiling: Converts C source files into assembly code.
Assembling: Converts assembly into machine (binary) code.
Linking: Links object files together into an executable binary.

Use the gcc compiler to compile C files:

gcc hello.c -o hello  # Creates an executable named 'hello'

Basic Program Structure

All C programs begin execution from the main function:

int main() {
    return 0;
}

To print to the console, include the stdio library:

#include <stdio.h>
int main() {
    printf("hello");
    return 0;
}

Constants and Macros

Use #define to define constants and macros:

#define GREETING "hello"

#include <stdio.h>
int main() {
    printf(GREETING);
    return 0;
}

Reading Input

Use scanf() to read user input:

%d – Decimal integer
%f – Float
%e – Exponential
%g – Compact exponential/float

scanf("%d", &x);         // Reads integer into x
scanf("%d,%f", &x, &y);  // Reads integer and float

Data Type Conversions

int + float results in a float
int / int truncates the decimal (integer division)
int i = 33.4f; will truncate to 33

Comparison chaining like i < j < k does not work as expected.

Booleans

C uses integers (0 = false, non-zero = true). Use stdbool.h for true and false keywords, but honestly, you can just use 0’s and 1’s and it works the same.

Function Parameters

Pass by value: Copies variable; original is unchanged.
Pass by reference: Pass pointer to variable; original can be modified.

C defaults to pass-by-value. Use pointers to simulate pass-by-reference.

Variable Scope

extern: Declares a global variable defined elsewhere.
static: Keeps variable persistent across function calls.

Arrays

int array[10] = {0};                    // All elements set to 0
int array[10] = {1, 2, 3};              // Remaining values default to 0
int array[10] = {[2] = 3, [7] = 4};     // Sparse initialization
int length = (int)(sizeof(array) / sizeof(int));

Function Declaration

<return-type> functionName(<parameters>) {
    // function body
    return <value>;  // Default return type is int
}

Arrays cannot be returned directly, but can be passed as parameters.

Exit

#include <stdlib.h>
exit(0);  // Terminates the program

Pointers

POINTERS — honestly one of the most important concepts in C.

Pointers store memory addresses:

int a = 5;
int *p = &a;  // p holds address of a
*p = 10;      // modifies value of a to 10

const int *p: Cannot modify value pointed to
int *const p: Cannot change address stored

Pointers & Arrays

Pointers and arrays are closely related, primarily due to memory’s structure as one big array.

You can use pointers instead of array indexing to retrieve a value at each index:

int a[10];
int *p = &a[2];  // Points to index 2
int *q = p + 3;  // Points to index 5

Multidimensional arrays increment column-first when pointer is used externally.

int a[5][5];  // a → a[0]

Strings

String literals are char pointers:

char *s = "abc";

Use <string.h> for string functions:

strlen(str)
strcat(str1, str2)

Multifile Compilation

#include <filename>   // System headers
#include "filename"   // User-defined files

Gives your file access to functions and variables from other files. Creates a class-like feel to your code.

Preprocessor Directives

Preprocessors exist for:

File inclusions (see above)

Macro definitions:

#define SPEED_OF_LIGHT 2.9
#define IS_EVEN(n) (n % 2 == 0)
#define HI(...) (printf(#_VA_ARGS_))

Conditional compilation:

#ifdef CROSS_COMPILE
// only compiled for BeagleBone
#endif

File I/O

FILE *fp = fopen("in.dat", "r");
fclose(fp);

Modes:

r – read
w – write (overwrites)
a – append

Functions:

fread(), fwrite()
fseek(), SEEK_SET, SEEK_END, SEEK_CUR

Dynamic Memory

Another really important concept in 262!!

You can claim memory from the heap directly in C:

malloc(size) – Allocates memory block
calloc(n, size) – Allocates and clears memory
realloc(ptr, new_size) – Resizes memory
free(ptr) – Frees memory to avoid leaks

C has no garbage collector — you must manually free what you allocate.

Structs

Group variables under one name:

struct Person {
    char name[50];
    int age;
    float salary;
};

struct Person p;
p.age = 5;
strcpy(p.name, "Bob");

With typedef:

typedef struct {
    char name[50];
    int age;
    float height;
} Person;

Person p1;
p1.age = 30;

Structs can be passed to and returned from functions. You can also have nested structs.

Unions

Similar to structs but all members share memory:

union {
    int i;
    double d;
} u;

Saves memory, but changing one affects all others. Don’t use often — makes code hard to read.

Enums

enum Suit { CLUBS, SPADES };

Just like Java enums. Assigns integer values to names.

Linked Lists

You’ll probably implement this in like 3-4 classes (262, 367, 471, etc) so honestly implementing a linked list becomes second nature

struct Node {
    int data;
    struct Node* next;
};

struct Node* newNode = (struct Node*)malloc(sizeof(struct Node));
newNode->data = value;
newNode->next = NULL;

Always use pointers in linked lists because the memory is dynamically allocated.

Bitwise Operations

Operate directly on bits:

int i = 4;   // 0100
i << 1;      // 1000 → 8
i >> 2;      // 0001 → 1

&  AND
|  OR
^  XOR
~  NOT

CS 310 - Data Structures

Time vs Space Complexity

Time complexity – How long it takes for an algorithm to run.
Space complexity – How much space is occupied by the algorithm

Trade-offs

When is it better to sacrifice time or storage?
Typically depends on readability, long-term use case, and how important latency is to the consumer

Dynamic Arrays

Arrays without a fixed size; you can add any number of elements.
Creation: Since arrays are of fixed size in low-level languages, to implement a dynamic array you have to:
- Start with an initial size (user-defined or arbitrary)
- As more values are added and you run out of space, create a new array with double the size
- Shift all elements from the old array into the new one (an expensive operation, but not frequent)
Deletion:
- To delete a particular index, remove the element and shift everything after it to the left
- If the array becomes less than 1/3 full, you can consider shrinking (halving) the array to save space

Linked Lists

A data structure in which objects which are not sequentially stored in memory can be accessed by creating a node that stores a pointer/reference to the next object.
Helpful for memory because now consecutive chunks of memory are no longer required, making it easier to create and delete objects dynamically.

Creation

In Java, you can create objects for each item in the list.
In C/C++, use structs.
Typically each node has 2-3 fields: one for the value, one or two for the next/previous pointers.

Types

Singly Linked List: Only pointer to the next node; supports one-way traversal.
Doubly Linked List: Pointers to both next and previous nodes; easier to traverse and modify.
Circular Linked List: Tail points back to the head; forms a loop.

Stacks and Queues

Stack (LIFO)

First in, last out.

Queue (FIFO)

First in, first out.

Implementation

Can be implemented with arrays or linked lists.
Linked list implementation is often better for queues because it avoids shifting.
Array implementation typically uses two pointers (start and end).
When the end pointer gets to the end of the array, and the start pointer is not at index 0, it wraps around (circular buffer).
If the queue needs to be expanded, the size is doubled. Values are re-added starting from the start pointer to the end of the original array, and then from index 0 to the end pointer.

Priority Queue

Special case where the highest (or lowest) priority element is removed first.
Typically implemented with a min or max heap.
Order is based on value, not insertion time.

Hashmaps / Hash Tables

Solve the problem of search time.
For other data structures, lookup is O(n); with hashmaps, it’s O(1) on average.

Implementation

Hash tables are implemented as arrays under the hood.

A hash function maps a key to an index:

index = value % len_array  # simple but not great hash function

The value is stored at that index.

Collision Handling

What if two values map to the same index?

Separate Chaining:
- The array stores pointer objects (like linked lists).
- Multiple values that hash to the same index are added to the list.
- Downside: slower lookups due to list traversal and potential for all values to cluster in one index.
- A fix: track the longest list; if it exceeds a threshold, rehash with a new function or bigger array.
Linear Probing:
- If index is taken, look at the next available index (index + 1, +2, …).
- Pros: avoids wasting space; Cons: clustering and longer probe times.
Quadratic Probing:
- Similar to linear probing, but the index jump increases quadratically: index + n^2
Double Hashing:
- Apply another hash function if a collision occurs, and retry until an empty spot is found.

C++ uses a combination of separate chaining with open addressing (linear/quadratic probing).

Eventually, as the table gets filled up, a new hash function is needed and the array has to be extended.

Graphs

Graphs are made up of vertices (nodes) and edges (arcs).
Edges are defined by vertex pairs.
If the pairs are ordered (e.g., [1,2] ≠ [2,1]), the graph is directed (directional edges).

Terminology

Path: Sequence of vertices with a valid edge between each pair.
Simple Path: No repeated vertices.
Cyclic: Start and end vertex are the same.
Weighted Path: Edges have associated “costs”.

Representations

Adjacency Matrix

nxn matrix where n is the number of vertices.
Rows = starting vertex, columns = ending vertex.

Best for dense graphs but uses a lot of space.

|     | A | B | C | D | E |
|-----|---|---|---|---|---|
| A   | 0 | 1 | 1 | 1 | 0 |
| B   | 1 | 0 | 0 | 0 | 1 |
| C   | 1 | 0 | 0 | 0 | 1 |
| D   | 1 | 0 | 0 | 0 | 0 |
| E   | 0 | 1 | 1 | 0 | 0 |

Adjacency List

Each index represents a vertex.
Values are a list of vertices it’s connected to.
Great for sparse graphs (which is often the case).
```
A: B, C, D
B: A, E
C: A, E
D: A
E: B, C
```

Graph Traversals

BFS: Queue system; level-order traversal.
DFS: Stack or recursion; post-order style traversal.

Minimum Spanning Tree (MST)

A tree that connects all vertices with the minimum number of edges.

Cannot contain cycles or be disconnected.
If using weights, it should have the lowest total weight.
A graph can have multiple MSTs; if all weights are unique, there is only one.

Prim’s Algorithm

Eliminate self-loops and duplicate edges (keep the least weight).
Pick a root node.
Pick the least weight edge from the vertex to any unvisited vertex.
Continue picking the minimum edge that connects to the growing tree.
Repeat until all vertices are visited.

Kruskal’s Algorithm

Eliminate self-loops and duplicates.
Sort all edges in increasing order.
Keep connecting edges (in order) as long as they don’t form a cycle.
Stop once MST is formed.

Topological Sort

Can only be performed on a Directed Acyclic Graph (DAG).
Common example: course scheduling.

For each vertex, track number of inbound edges (in-degree).
Find all vertices with in-degree 0; add them to result list.
Decrease in-degrees of connected vertices.
Repeat until all vertices are processed.

Heaps (Min/Max)

Can be implemented as trees.
Pain to insert/delete because of constant balancing.
Min heap: smallest element at the top.
Max heap: largest element at the top.
Used in priority queues.
Honestly, you don’t need to know much detail about this.

Trees

Trees represent hierarchical structures.
Binary Tree: Each node has ≤ 2 outbound edges and 1 inbound edge.
Leaf nodes have no children.
Root node has no parent.
Usually implemented as linked lists with left/right child pointers. Can also be implemented using arrays.

Types of Binary Trees

Full/Proper Binary Tree

Each node has either 0 or 2 children.

      A
     / \
    B   C
   / \
  D   E

Complete Binary Tree

All levels are fully filled except the last, which is filled left to right.

      A
     / \
    B   C
   / \ /
  D  E F

Perfect Binary Tree

Both full and complete.

      A
     / \
    B   C
   / \ / \
  D  E F  G

Degenerate Tree

All internal nodes have one child.

A
 \
  B
   \
    C
   /
  D

Tree Traversals (using linked list nodes)

Inorder: left, root, right

def inorder(node):
    if node is None:
        return
    inorder(node.left)
    print(node.val)
    inorder(node.right)

Postorder: left, right, root

def postorder(node):
    if node is None:
        return
    postorder(node.left)
    postorder(node.right)
    print(node.val)

Preorder: root, left, right

def preorder(node):
    if node is None:
        return
    print(node.val)
    preorder(node.left)
    preorder(node.right)

Constructing a Tree from Array Representation

Given Preorder + Inorder → build from the left

Preorder:   [A, B, D, E, C, F, G]
Inorder:    [D, B, E, A, F, C, G]

      A
     / \
    B   C
   / \ / \
  D  E F  G

Given Postorder + Inorder → build from the right

Postorder:  [D, E, B, F, G, C, A]
Inorder:    [D, B, E, A, F, C, G]

      A
     / \
    B   C
   / \ / \
  D  E F  G

Other Tree Types (Rarely Tested in Detail)

Binary Search Tree (BST): All nodes to the left of the root are less than the root.
AVL Tree: A BST that can self-balance when one side becomes deeper.
Red-Black Tree: A more complex self-balancing BST. (You might have 5 pages of notes on this – likely won’t use them.)
B Tree: Balanced m-way tree. Binary trees have 2 children; B trees can have many. Each node can hold multiple values.
B+ Tree: Like a B Tree, but all data is in the leaf nodes only.

Here’s a table comparing all the data structures.

Data Structure	Access	Insert	Delete	Search	Notes
Array	O(1)	O(n)	O(n)	O(n)	Fixed size (in C), insert/delete requires shifting
Dynamic Array	O(1)	O(1)*	O(n)	O(n)	*Amortized insert at end, resizing occasionally
Singly Linked List	O(n)	O(1)**	O(1)**	O(n)	**O(1) at head, O(n) at tail or specific index
Doubly Linked List	O(n)	O(1)**	O(1)**	O(n)	Better for bidirectional traversal
Stack (Array/LL)	O(n)	O(1)	O(1)	O(n)	LIFO: push/pop at top
Queue (Array/LL)	O(n)	O(1)	O(1)	O(n)	FIFO: enqueue at rear, dequeue at front
Deque	O(1)	O(1)	O(1)	O(n)	Double-ended queue, insert/delete both ends
Hash Table	O(1)	O(1)	O(1)	O(1)	O(n) worst-case due to collisions, requires hashing
Binary Search Tree	O(log n)	O(log n)	O(log n)	O(log n)	Must be balanced (e.g., AVL, Red-Black)
AVL Tree	O(log n)	O(log n)	O(log n)	O(log n)	Self-balancing BST
Heap (Min/Max)	O(n)	O(log n)	O(log n)	O(n)	Efficient for priority queue, top is min/max
Trie (Prefix Tree)	O(k)	O(k)	O(k)	O(k)	k = length of string; used for autocomplete, dictionaries
Graph (Adj List)	O(1)	O(1)	O(1)	O(V + E)	Good for sparse graphs
Graph (Adj Matrix)	O(1)	O(1)	O(1)	O(V²)	Good for dense graphs

CS 330 - Formal Methods and Models

Propositional Logic

Propositions are statements that are either true or false (but not both).

Examples:
- “It is raining.” (True/False)
- “3 + 4 = 7” (True)

If a and b are propositions, then:

a ∨ b (logical OR) is also a proposition.

Logical Equivalence: Two propositions are logically equivalent if their truth tables are the same.

Tautology: A proposition that is always true. Contradiction: A proposition that is always false.

Advanced Example: Show that ((p ∧ q) → r) ≡ (¬r → ¬(p ∧ q))

Use contrapositive: p → q ≡ ¬q → ¬p
Hence, equivalent by contrapositive.

Set Theory

A set is a collection of distinct elements.
- Extensional: {2, 3, 4}
- Intensional: {x | x < 10}

Notation:

∅ = Empty set
x ∈ S = x is an element of set S
R ⊆ S = R is a subset of S (may be equal)
T ⊂ S = T is a proper subset (T ≠ S)
|S| = Cardinality (number of elements)
2^S = Powerset (set of all subsets of S)

Advanced Example: Let A = {1, 2} and B = {x | x < 4}. Then 2^A = {∅, {1}, {2}, {1, 2}}, and the intersection of A ∪ B = {1, 2, 3}

Functions:

Total function: Every input has a mapped output.
Partial function: Some inputs may not have outputs.

Logical Operators

¬ or ~ → NOT
∧ → AND
∨ → OR
→ → Implication (if…then)
↔ → Bi-conditional (if and only if)

Important Logical Laws:

Contrapositive: p → q is equivalent to ¬q → ¬p
Conditional Law: p → q ≡ ¬p ∨ q
Distributive: p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r)
Subsumption: p ∧ (p ∨ q) ≡ p
De Morgan’s Laws:
- ¬(p ∧ q) ≡ ¬p ∨ ¬q
- ¬(p ∨ q) ≡ ¬p ∧ ¬q

Truth Table Example

| p | q | p → q | ¬q | ¬q → ¬p | |—|—|——–|—-|———-| | T | T | T | F | T | | T | F | F | T | F | | F | T | T | F | T | | F | F | T | T | T |

Demonstrates equivalence of implication and contrapositive.

Proofs and Reasoning

A proof is a sequence of logically valid steps to demonstrate the truth of a statement.
Theorem: Proven statement
Axiom: Accepted as true without proof
Inference rules: Methods to derive conclusions
Assumptions: Temporary truths to explore implications (shown in square brackets: [p])

Common Rules:

Modus Ponens: From p and p → q, conclude q
Modus Tollens: From p → q and ¬q, conclude ¬p

Example Proof:

[p → q]    (Assumption)
[p]        (Assumption)
[q]        (From 1, 2 by Modus Ponens)

Advanced Example: Prove: ((p → q) ∧ (q → r)) → (p → r)

Assume (p → q) and (q → r)
Assume p
From p → q and p, conclude q
From q → r and q, conclude r
Therefore, p → r

Predicate Logic

A predicate is a function whose output is either true or false.
- n > 3 is a predicate.
- 5 > 3 is a proposition (concrete value).

Truth Set: The set of all values that make a predicate true.

Quantifiers:

Universal Quantifier: ∀x P(x) — for all x, P(x) holds
Existential Quantifier: ∃x P(x) — there exists an x such that P(x)

Negation:

¬∃x P(x) ≡ ∀x ¬P(x)
¬∀x P(x) ≡ ∃x ¬P(x)

Example:

∀x (x > 0 → x^2 > 0)
∃x (x^2 = 4) ⇒ x could be 2 or -2

Sequences and Program Verification

A sequence is an ordered collection of elements (repeats allowed).

Program Verification (Partial Correctness):

Prove that the output is correct if the program terminates.
Hoare Triple: {P} C {Q} where:
- P is the precondition
- C is the code/command
- Q is the postcondition

Example:

{even(x)} x = x + 3; {¬even(x)}

Advanced Example:

{x = n ∧ n ≥ 0}
sum = 0;
i = 1;
while (i ≤ n) {
    sum = sum + i;
    i++;
}
{sum = n(n+1)/2}

Normal Forms

Disjunctive Normal Form (DNF): OR of ANDs
- (p ∧ q) ∨ (¬p ∧ r)
Conjunctive Normal Form (CNF): AND of ORs
- (p ∨ q) ∧ (¬p ∨ r)

Formal Languages

Language: A set of strings
Alphabet (Σ): Set of symbols (e.g., {a, b})
Σ*: Set of all possible strings using the alphabet
L ⊆ Σ*: Language is a subset of all strings

Examples:

Alphabet: {a, b}
String: abba
Language: {a, ab, abb}
Empty string: ε (also a valid string in many languages)

Advanced Example:

Language: All strings over {a, b} where the number of a’s is even and total length is divisible by 3

Regular Languages and Regex

Regular Languages are defined by finite state automata and closed under union, concatenation, and Kleene star.

Examples:

Strings ending in “ab”
Strings with even number of 0s: regex = (00)*

Regex Examples:

(a|b)*: All strings of a’s and b’s
a*b+: One or more b’s preceded by zero or more a’s

Advanced Regex:

Strings that contain “aba” as a substring: .*aba.*

Regular Grammars

Productions:

X → aX, X → a, X → ε

Example (basic): Strings ending in “a”:

S → aS | bS | a

Advanced Example: Strings with at least one a followed by one b:

S → aA
A → bB | aA
B → aB | bB | ε

Finite Automata (DFA)

DFA Example: Strings ending in “ab”

States: q0 (start), q1, q2 (accept)
Transitions:
- q0 --a--> q1
- q1 --b--> q2

Example: Accept strings with even number of a’s and odd number of b’s

Need 4 states to track parity of a and b

NFA (Non-deterministic Finite Automata)

NFA Example: Accept strings with a substring “aba”

Transitions:
- q0 --a--> q1
- q1 --b--> q2
- q2 --a--> q3 (accept)
- Also include q0 transitions to self on any character to allow skipping

Pushdown Automata (PDA)

Used for languages like balanced parentheses:

PDA Example: Recognize L = { a^n b^n | n ≥ 0 }

Push an a on the stack for each input a
Pop an a from stack for each input b
Accept if stack is empty at end

Advanced PDA Example: L = { a^n b^m c^n | n, m ≥ 0 }

Push for a, ignore b, pop for c
Accept on empty stack

Context-Free Grammars (CFG)

CFGs can express nested patterns like palindromes and balanced expressions.

Example:

S → aSb | ε

Generates strings like: “”, “ab”, “aabb”, “aaabbb”, etc.

Advanced CFG: Palindromes over {a, b}:

S → aSa | bSb | a | b | ε

Turing Machines

Turing machines use an infinite tape to simulate any computation.

Basic Example: Accept strings with equal number of as and bs

Scan tape marking a with X, search and mark b with Y, repeat

Advanced Example: Binary incrementer

Input: Binary number like 1101
Operation: Increment by 1 → 1110
Turing Machine walks right, flips trailing 1s to 0s, then flips first 0 to 1

CS 484 - Data Mining

Note: I feel like they try to scare you away in this class with a lot of math, but push through it — I promise it becomes more manageable later. You’re allowed a cheat sheet, so just write down all the math. I high-key forgot half of linear algebra by the time I took this class. My notes don’t have too much math, probably because I myself didn’t understand it.

Data Mining

Data Mining is the process of discovering patterns and anomalies in large datasets.

Data = collection of data objects

Discrete attributes: Categorical values
- Nominal: no order (e.g., gender, color)
- Ordinal: ordered (e.g., star ratings)
Continuous attributes: Numeric
- Interval (e.g., temperature)
- Ratio (e.g., weight)

Data Cleaning

Raw data isn’t ready for machine learning. It often has:

Missing values
Duplicates
Non-numerical data (qualitative)
Skewed distributions

Solutions include:

Replace missing values with mean, median, or mode
Merge or remove duplicates
One-hot encoding/vectorization for categorical variables
Normalize/standardize data (e.g., z-score normalization)
Dimensionality reduction to remove redundancy

K-Nearest Neighbors (KNN)

KNN is a lazy learning algorithm that classifies a point based on its k nearest labeled neighbors.

Time Complexity: O(nD + nK)
Hyperparameters:
- k: Number of neighbors
- Distance metric (Euclidean, Manhattan, Cosine)

Effects of k:

Small k → overfitting
Large k → underfitting

Variants:

Condensed KNN: Keeps only decisive points (usually boundary cases)
Distance weighting: Closer points are weighted more heavily

Validation Techniques

Validation Set: Separate data for model evaluation
Cross-Validation: Split training data into k folds; each fold used once for validation

Model Evaluation: Confusion Matrix

Binary Classification Example

Suppose we have the following results from a binary classifier:

True Positives (TP): 50
False Negatives (FN): 10
False Positives (FP): 5
True Negatives (TN): 100

	Predicted Positive	Predicted Negative
Actual Positive	TP = 50	FN = 10
Actual Negative	FP = 5	TN = 100

Calculations:

Recall (Sensitivity): Recall = TP / (TP + FN) = 50 / (50 + 10) = 50 / 60 = 0.833
Precision: Precision = TP / (TP + FP) = 50 / (50 + 5) = 50 / 55 = 0.909
Accuracy: Accuracy = (TP + TN) / Total = (50 + 100) / (50 + 100 + 5 + 10) = 150 / 165 = 0.909
F1 Score: F1 = 2 * (Precision * Recall) / (Precision + Recall) = 2 * (0.909 * 0.833) / (0.909 + 0.833) = 2 * 0.757 / 1.742 ≈ 0.87

ROC & AUC

ROC Curve: Plots TPR vs FPR
AUC (Area Under Curve): Indicates model’s ability to distinguish between classes. 1 = perfect, 0.5 = random guessing

Multi-Class Classification Example

Suppose we have a 3-class problem:

Class	TP	FP	FN
A	40	10	5
B	30	15	10
C	50	5	20

Per-Class Metrics:

Class A:
- Precision = 40 / (40 + 10) = 0.80
- Recall = 40 / (40 + 5) = 0.89
Class B:
- Precision = 30 / (30 + 15) = 0.67
- Recall = 30 / (30 + 10) = 0.75
Class C:
- Precision = 50 / (50 + 5) = 0.91
- Recall = 50 / (50 + 20) = 0.71

Macro-Averaged Metrics:

Macro Precision: (0.80 + 0.67 + 0.91) / 3 = 0.793
Macro Recall: (0.89 + 0.75 + 0.71) / 3 = 0.783
Macro F1 Score:
- F1_A = 2 * (0.80 * 0.89) / (0.80 + 0.89) = 0.84
- F1_B = 2 * (0.67 * 0.75) / (0.67 + 0.75) = 0.71
- F1_C = 2 * (0.91 * 0.71) / (0.91 + 0.71) = 0.80
- Macro F1 = (0.84 + 0.71 + 0.80) / 3 = 0.783

Micro-Averaged Metrics:

Total TP = 120, FP = 30, FN = 35
Micro Precision = 120 / (120 + 30) = 0.80
Micro Recall = 120 / (120 + 35) ≈ 0.774
Micro F1 = 2 * (0.80 * 0.774) / (0.80 + 0.774) ≈ 0.787

PCA (Principal Component Analysis)

Goal: Reduce dimensionality while preserving variance

PCA Steps:

Standardize the data (subtract mean, divide by std deviation)
Compute the covariance matrix S = (1/n) * X^T * X
Compute eigenvalues and eigenvectors of the covariance matrix
Sort eigenvalues in descending order and pick top k eigenvectors
Project the original data X onto the new subspace using the selected eigenvectors

Worked Example: Suppose we have 2D points: (2, 0), (0, 2), (3, 1), (1, 3)

Standardize → subtract mean: mean_x = 1.5, mean_y = 1.5
Centered data: (0.5, -1.5), (-1.5, 0.5), (1.5, -0.5), (-0.5, 1.5)
Covariance matrix S =
```
[[1.67, -0.83],
 [-0.83, 1.67]]
```
Eigenvalues: λ1 = 2.5, λ2 = 0.83 → Keep the one with 2.5
Project data onto principal direction (eigenvector of 2.5)

Decision Trees

Used for classification by creating splits on attributes.

How to choose split?

Gini Index: Gini(D) = 1 - Σ (p_i)^2
Information Gain: IG(D, A) = Entropy(D) - Σ (|D_i| / |D|) * Entropy(D_i)

Gini Index Example: Consider a binary dataset with 6 samples: 4 Positive, 2 Negative

Gini = 1 - (4/6)^2 - (2/6)^2 = 1 - 0.444 - 0.111 = 0.445

Split on an attribute A results in:

Left split: 2P, 1N → Gini = 1 - (2/3)^2 - (1/3)^2 = 0.444
Right split: 2P, 1N → same Gini = 0.444 Overall Gini = 0.444

Information Gain Example: Original entropy:

Entropy = - (4/6)log2(4/6) - (2/6)log2(2/6) ≈ 0.918

If attribute split gives:

Left: 2P, 1N → Entropy = 0.918
Right: 2P, 1N → same Entropy IG = 0.918 - (3/6)0.918 - (3/6)0.918 = 0 So no gain — try different attribute

Linear Regression

Goal: Predict a continuous output from a linear combination of features. Essentially a glorified y = mx + b, but with as many variables (the x) as needed to represent all the features. The main goal is to find weight vectors (the m) which will minimize the over rss.

Model:

y = w_0 + w_1x_1 + w_2x_2 + ... + w_nx_n

Loss Function (RSS):

RSS = Σ(y_i - ŷ_i)^2 = Σ(y_i - (w^T x_i))^2

Gradient Descent Update Rule:

w_j = w_j - α * ∂RSS/∂w_j = w_j - α * Σ(2 * (y_i - ŷ_i) * -x_ij)

Step size is an important hyperparameter because too small and the learning is too slow, but too big of a step will cause a lot of bouncing between having to go forward and backward.

Stopping rule: if validation is not changing more than some number, n, stop training.

Example: Dataset: {(x=1, y=2), (x=2, y=4)} Start with: w = 0, b = 0 Predictions: ŷ = 0 Error: e = y - ŷ = [2, 4] Gradient: ∂RSS/∂w = -2 * (21 + 42) = -20 Update: w = 0 + α * 20

Regularization:

L1 (Lasso): λ * Σ|w_i| → sparse weights – good for feature selection
L2 (Ridge): λ * Σw_i^2 → smaller weights – decreases outlier sensitivity

Logistic Regression

Used for binary classification because the sigmoid function is bound by 0’s and 1’s

Bernoulli Model:

P(y|x) = σ(w^T x) = 1 / (1 + e^{-w^T x})

Loss Function (Binary Cross Entropy):

L = -Σ [y_i * log(p_i) + (1 - y_i) * log(1 - p_i)]

Multiclass (Softmax): The output of softmax is a list of probabilities summing to 1. The number which is the highest is the most likely class it belongs to.

P(y=j|x) = e^{z_j} / Σ e^{z_k}

Cross-Entropy Loss (Softmax):

L = -Σ y_ij * log(softmax_j)

Gradient:

∂L/∂w_j = Σ (p_j - y_j) * x

Perceptrons and Neural Networks

Perceptron: A single-layer neural network that takes in multiple inputs, applies weights, sums them, and uses an activation function to produce a binary output.

Training: Perceptrons learn by adjusting weights after each sample. It compares the predicted output with the true label, calculates the error, and updates the weights using the perceptron learning rule.
Update rule:
```
w = w + α * (y - ŷ) * x
```

Example: Suppose we want a perceptron that performs logical AND:

Input: x1, x2 ∈ {0,1}
Weights: w1 = 1, w2 = 1, bias = -1.5
Activation: output = 1 if (x1w1 + x2w2 + bias) > 0, else 0

Truth table:

x1=0, x2=0 → output = 0
x1=1, x2=0 → output = 0
x1=0, x2=1 → output = 0
x1=1, x2=1 → output = 1

Multilayer Perceptron (MLP):

A type of feedforward neural network with one or more hidden layers.
Forward propagation: Moves information forward through layers.
Backward propagation: Sends error backward and updates weights using gradients.
Hidden layers:
- Capture complex, non-linear relationships
- Automatically learn feature combinations and perform dimensionality reduction
Increasing neurons allows learning more complex patterns
Increasing hidden layers increases the abstraction and hierarchy in the model

Too many neurons/layers → risk of overfitting. Too few → underfitting.

Activation Function Options:

Sigmoid: 1 / (1 + e^{-x}) — Not recommended in hidden layers due to vanishing gradient problem, but fine in output layers for binary classification.
Tanh: Similar to sigmoid but ranges [-1,1] — zero-centered.
ReLU: max(0, x) — Default choice for hidden layers; efficient and helps convergence.
Leaky ReLU: Outputs a small negative value for negative inputs instead of zero.
SELU: Scaled ELU — good for fully connected layers and self-normalizing networks.

Dropout Layers:

Each neuron has a probability of being ignored (set to 0) during training.
Used to prevent overfitting and improve generalization.
Only active during training, not testing.

Block Feature Vectors:

Used especially for text — splits input into fixed-sized blocks and processes each independently.

Generative vs. Discriminative Models:

Generative: Model the full data distribution P(x, y) to understand and generate data (e.g., Naive Bayes, GANs).
Discriminative: Focus only on P(y|x) — learning boundaries between classes (e.g., logistic regression, SVM).

Deep Learning:

More neurons + more layers = deeper models
Necessary for feature abstraction and complex data types like text, speech, or images

Convolutional Neural Networks (CNNs):

Specially designed for image and grid-based data.
Input: Raw image (matrix of pixels)
Convolutional filter: Learns local patterns like edges, corners
Feature map: Output of applying a filter
Activation function: Adds non-linearity to feature maps
Pooling layer: Downsamples feature maps, often with max pooling
Flatten: Converts pooled maps into a 1D vector for fully connected layers

Gradient Optimizers:

SGD: Stochastic Gradient Descent — simple and widely used
Adagrad: Adapts learning rate per feature; helps with sparse data
RMSprop: Adjusts learning rate using a moving average of gradients
Adam: Combines RMSprop and momentum; widely used for deep learning

Batch Normalization:

Normalizes activations in hidden layers
Helps mitigate exploding/vanishing gradients
Stabilizes and accelerates training

MLP Architecture:

Input → Hidden Layers → Output
Use activation functions like ReLU, sigmoid, tanh

Backpropagation:

Compute error from output
Propagate error backward through layers
Update weights using gradient descent

Dropout: Randomly drop neurons during training to reduce overfitting.

Recommender Systems

*Collaborative Filtering:

Item-based filtering : Predict rating of user u on item i:

r̂_ui = r̄_u + (Σ sim(u,v) * (r_vi - r̄_v)) / Σ |sim(u,v)|

User-based: Predicts rating of user u on another user u1

Computation heavy! Henvce matrix factorize which decomposes matrix into 2 smaller ones and creates latent factors which are not explicitly defined but inferred from the data.

Matrix Factorization:

Approximate R ≈ U * V^T
Optimize latent features U and V using SGD
Loss: (R - UV^T)^2 + λ (||U||^2 + ||V||^2)

Example: User-Item ratings:

    M1 M2 M3
U1:  5  ?  3
U2:  4  2  ?

After factorization:

Estimate U1’s missing rating using dot product of U1-row and M2-column in factorized matrices

Association Rule Mining

Terms:

Support: Fraction of transactions with both A and B
Confidence: P(B A) = support(A and B) / support(A)

Apriori Principle: All subsets of a frequent itemset must be frequent.

Example: Transactions:

{milk, bread}
{milk, butter}
{bread, butter}
{milk, bread, butter}

Support:

support({milk, bread}) = 2/4 = 0.5 Confidence:
confidence(milk → bread) = support(milk & bread) / support(milk) = 0.5 / 0.75 = 0.67

Clustering

K-means Clustering:

Choose k initial centroids
Assign each point to the nearest centroid
Update centroids as the mean of points in each cluster
Repeat until convergence

Example: Data: [1, 2, 10, 11] Initial centroids: 1 and 10 Iteration 1: [1,2] to Cluster 1, [10,11] to Cluster 2 New centroids: 1.5 and 10.5 → converged

Hierarchical Clustering:

Agglomerative: Merge closest clusters iteratively
Divisive: Split cluster until each point stands alone

DBSCAN:

Parameters: eps and minPts
Classify core, border, and noise points
Example: Points (1,2,3,10,11); eps=2, minPts=2
- Core: {1,2,3}, Cluster 1
- Core: {10,11}, Cluster 2

Clustering Evaluation

SSE (Sum of Squared Errors): Measures cohesion SSB (Between groups): Measures separation Homogeneity: All clusters contain only members of a single class Completeness: All members of a given class are in the same cluster V-measure: Harmonic mean of homogeneity and completeness

CS 478 - Natural Language Processing

Logistic Regression —

Sentiment analysis problem – binary classification (is this tweet positive or negative?).

Feature Extraction: Create a large vector containing every single word in the vocabulary and assign value 1 if the word is present in the tweet or 0 if not.

Problem: Very sparse vector since most tweets will not have many overlapping words, resulting in slow prediction.

Alternate Strategy:

Create classes with positive and negative word connotations.
For a given tweet, count the number of positive and negative words. Whichever class has more words determines the sentiment.

Preprocessing a Tweet:

Remove common words (“a”, “the”, etc.)
Remove punctuation
Remove Twitter handles and URLs

Bag of Words Approach:

Convert all words to lowercase
Perform stemming (e.g., “stopped” → “stop”)
Compare each word to a positive/negative dictionary and count occurrences

Example:

Tweet: “I loved the new Batman movie! Absolutely amazing.”
After cleaning: [“love”, “new”, “batman”, “movie”, “absolutely”, “amazing”]
Match with dictionary: 3 positive words → classify as positive sentiment

Text Normalization

Tokenization – Identify words in a sentence
Lemmatization – Normalize words to their root form (“am”, “are”, “is” → “be”)
Stemming – Remove suffixes (“happily” → “happi”)
Segmentation – Break text into sentences

Problems with Tokenization:

Hyphens
Acronyms
Contractions
Multi-word units (“New York”)

Ambiguity:

Prepositional phrase attachment is confusing (“I saw the man with a telescope”)

Byte Pair Encoding (BPE)

Start with every character as a token and build a dictionary by:

Merging adjacent tokens
Tracking frequency
Keeping merges that occur above a frequency threshold
Repeating for groups of 3, 4… until frequency drops or no more segments

Example:

Text: “low lower lowest”
Initial tokens: [l, o, w], [l, o, w, e, r], [l, o, w, e, s, t]
Merge most frequent pair: (l, o) → lo
New tokens: [lo, w], [lo, w, e, r], …
Continue merging: (lo, w) → low, (low, e) → lowe, etc.

Purpose: Handles rare/unseen words better; enables subword segmentation.

N-Gram Modeling

Predict the next word based on the n-1 previous words.

Unigram: P(word)
Bigram: P(w2 w1)
Trigram: P(w3 w1, w2)

Example: Corpus: “I love NLP. I love AI.”

Bigrams: (“I love”), (“love NLP”), (“love AI”)
P(“love” “I”) = 2/2 = 1
P(“AI” “love”) = 1/2 = 0.5

Issues: Zero probability for unseen n-grams.

Solutions:

Laplace Smoothing: Add 1 to all counts.
Interpolation: Combine uni-, bi-, and tri-gram models linearly.
Backoff: If trigram not seen, use bigram, then unigram.

Evaluation Metrics

Log-Likelihood: Measures how likely a model generated a sequence. Avoids underflow by summing log probabilities.
```
LL = Σ log P(w_i | w_{i-1})
```
Cross Entropy:
```
H(p) = -1/N Σ log_2 P(w_i)
```
Lower = better
Perplexity:
```
PP = 2^H(p)
```
Lower perplexity = better; interpreted as the average number of choices per word

Example: Suppose cross-entropy H(p) = 1.5 → PP = 2^1.5 ≈ 2.828 → model has ~2.8 likely choices per word

Decoding Strategies

Greedy: Always pick the highest probability word
Beam Search: Explore top n paths and keep the best long-term one
Sampling: Randomly pick a word based on probabilities
Top-k Sampling: Randomly choose from top-k most likely words

Neural Networks

See Data Mining section. Same architecture and training applies.

Zipf’s Law

The frequency of a word is inversely proportional to its rank:

Rank 1: “the” → appears 22000 times
Rank 2: “of” → appears 11000 times

Word Embeddings

Embeddings are dense vector representations of words, often trained via:

Word2Vec
- CBOW: Predict target from context
- Skip-gram: Predict context from target

Distributional Representation:

Based on co-occurrence — similar words share contexts
E.g., “cat” and “dog” occur with “fur”, “pet”, “tail” → similar vectors

Distributed Representation:

Encodes concepts on separate dimensions
E.g., “bank” → [finance_dim=1, river_dim=0.3]

Count-Based Embeddings

Build word-context matrix using frequency statistics

TF-IDF (Term Frequency-Inverse Document Frequency)

TF = (count of term in doc) / (total terms in doc)
IDF = log(N / df)
TF-IDF = TF * IDF

Example:

Term “deep” appears 3 times in a doc of 100 words

Appears in 5 of 1000 docs

TF = 3 / 100 = 0.03
IDF = log(1000 / 5) = log(200) ≈ 2.30
TF-IDF = 0.03 * 2.30 ≈ 0.069

PMI (Pointwise Mutual Information)

PMI(w, c) = log( P(w, c) / (P(w) * P(c)) )

Example:

P(w, c) = 0.001; P(w) = 0.01; P(c) = 0.005

PMI = log(0.001 / (0.01 * 0.005)) = log(0.001 / 0.00005) = log(20) ≈ 1.30

TF-IDF: good for document classification/search
PMI: useful for word similarity

Word-Context Matrix

A co-occurrence matrix where:

Rows = words
Columns = context words (e.g., 5 words before/after)
Values = count or normalized metric (PMI, TF-IDF)

Handling Long-Distance Dependencies

Problem: Words can depend on faraway tokens

Solution: Recurrent Neural Networks (RNN)

Include feedback loop to maintain internal state
Struggles with long-term dependencies due to vanishing gradients

LSTM (Long Short-Term Memory):

Introduces memory cells and gates (input, forget, output)
Maintains long-term context using additive updates

CS 485 - Robotics

Note: This class was actually so confusing, so I don’t have many detailed notes.

Rigid Body Transformations

Rigid body transformations involve orientation and position. These include translations and rotations, which are typically represented as matrix transformations.

Example of matrix representation [R|t] (2x3):

| R11 R12 t1 |
| R21 R22 t2 |

Where R is a 2x2 rotation matrix and t is the translation vector.

Pose = Position + Orientation

Global Warping

Applies a fixed set of operations (e.g., rotate, scale) to every point in space, usually represented as matrix operations.

2x2 Matrix Examples:

Scaling:
```
| sx  0 |
| 0  sy |
```

Rotation (angle θ):

| cos(θ) -sin(θ) |
| sin(θ)  cos(θ) |

Reflection (over y-axis):
```
| -1  0 |
|  0  1 |
```
Shear (x-direction):
```
| 1  shx |
| 0   1  |
```

Translation in Homogeneous Coordinates

Translation requires an extra coordinate (homogeneous coordinate).

3x3 Translation Matrix:

| 1  0  tx |
| 0  1  ty |
| 0  0   1 |

A transformation using homogeneous coordinates is considered affine if the last row is [0, 0, 1].

A homography is a projective transformation where the last row is [e, f, 1].

Layers of Transformation

Type	DOF	Preserves
Translation	2	Straight lines, parallelism, angles, lengths, orientation
Rigid/Euclidean	3	Straight lines, parallelism, angles, lengths
Similarity	4	Straight lines, parallelism, angles
Affine	5	Straight lines, parallelism
Projective	6	Straight lines

Matrix Representations:

Translation:
```
| 1  0  tx |
| 0  1  ty |
| 0  0   1 |
```

Rigid:

| cos(θ) -sin(θ) tx |
| sin(θ)  cos(θ) ty |
|   0       0     1 |

Similarity:

| s*cos(θ) -s*sin(θ) tx |
| s*sin(θ)  s*cos(θ) ty |
|     0         0     1 |

Affine:
```
| a  b  tx |
| c  d  ty |
| 0  0   1 |
```
Projective:
```
| a  b  tx |
| c  d  ty |
| e  f   1 |
```

Inverse Kinematics

You know the end-effector’s position and want to determine the joint angles or movements to get there.

Solving triangle geometry backwards.
Adds complexity in 3D due to orientation.

TF (Transform Library in ROS): Handles frame tracking and transformations in ROS.

Motion Control

Turns high-level goals into actuator commands.

Controller sends commands to actuators to affect system state.

Types:

Open-loop: No feedback from environment (no sensors).
Closed-loop: Feedback from sensors is used to update actions.

PID Controller (Proportional-Integral-Derivative)

A closed-loop system that combines three components:

Proportional (P): Based on current error
Integral (I): Accumulated past error
Derivative (D): Rate of change of error

PID Equation:

u(t) = K_p * e(t) + K_i * ∫e(t)dt + K_d * de(t)/dt

Where:

u(t) is the control output
e(t) is the error at time t
K_p, K_i, K_d are tuning constants

Example:

Desired position = 10 units
Current = 7 units → error = 3
Proportional term = Kp * 3
Integral term = Ki * total accumulated error
Derivative = Kd * (change in error)

Motion Planning

Motion control alone doesn’t work in complex environments.

ROS Global Planner:

Uses A* or Dijkstra’s algorithm on a global costmap
Creates a global path with waypoints

ROS Local Planner:

Uses local costmap and dynamic window approach
Computes small, feasible trajectories around obstacles

Combine both: Global planner gives long-term direction; local planner handles short-term pathing and obstacle avoidance.

Maze Solving Algorithms:

BFS
DFS
Heuristic Informed (A*)
Optimal Informed (Dijkstra’s)
Iterative Deepening

Vertical Cell Decomposition (VCD):

Split environment vertically between obstacles
Build connections between accessible vertical strips

Example of VCD:

Obstacles represented as polygons
Extend vertical lines up/down from each vertex
Connect open areas → build graph

Other Planning Tools:

Visibility Graph: Straightest possible path, but may lead to sharp turns
Voronoi Diagram: Equidistant from obstacles → safer but longer path

Search Strategies:

Greedy Best-First: Always choose the node closest to the goal (not always optimal)
Uniform Cost Search (UCS): Expands path with lowest cost (same as BFS if all costs equal)
A*: Combines UCS and Greedy using cost + heuristic

Filtering (State Estimation)

Assumptions:

Markov: Only the previous state matters
Stationary process: Observation probabilities are constant
Prior state: Initial state is known or estimated

Types:

Discrete: Hidden Markov Models (HMM)
Continuous: Kalman Filter (linear + Gaussian)
Non-Gaussian: Particle Filter (samples possible states)

Joint Probability Distribution:

P(x_1, x_2, ..., x_t, z_1, z_2, ..., z_t) = P(x_1) * Π P(x_t | x_{t-1}) * Π P(z_t | x_t)

Prediction Step:

P(x_t | z_{1:t-1}) = ∫ P(x_t | x_{t-1}) * P(x_{t-1} | z_{1:t-1}) dx_{t-1}

Update Step:

P(x_t | z_{1:t}) ∝ P(z_t | x_t) * P(x_t | z_{1:t-1})

CS 367 - Systems Programming

Bitwise Operations

Everything in computers is in bits (1s and 0s), which are easy to store due to their binary nature (on/off).

Bitwise operations are performed on individual bits:

Shift operations
Bit masking
Logical operations (AND, OR, NOT, XOR)

Logical Shift: Used with unsigned integers. Leftmost bits are filled with 0s after a right shift.

Arithmetic Shift: Used with signed integers. Leftmost bits are filled with the sign bit (1 for negative, 0 for positive) to preserve the sign.

Example:

Logical Right Shift: 1100 0000 >> 2 = 0011 0000
Arithmetic Right Shift: 1100 0000 >> 2 = 1111 0000 (preserve sign)

Memory and Addressing

Memory is like a large array of bytes. Programs refer to data using addresses. Each process is assigned a private address space.

Word size: Number of bits CPU can address (e.g., 64-bit → 2^64 bytes)
Byte-ordering:
- Little endian: MSB is on the right (lower address)
- Big endian: MSB is on the left (higher address)

Example:

Value: 0x12345678
Little endian: 78 56 34 12
Big endian:    12 34 56 78

RISC: All operations are the same size
Negative values are represented using two’s complement

Integer Casting & Overflow

Casting up:
- If unsigned → pad with 0
- If signed → pad with MSB
Casting down:
- Truncation (may lose data)

Overflow Detection:

Unsigned: Overflow if sum < operands
Signed: Overflow if sign changes incorrectly

Integer Multiplication via Bit Shift:

x * 8 = x << 3
x * 12 = x << 3 + x << 2

Integer Division via Bit Shift:

x / 8 = x >> 3 (only works for powers of 2)

Floating Point Representation

Notation: (-1)^s * M * 2^E

7-bit floating point layout:

1 bit sign
3 bit exponent
3 bit fraction

Bias: bias = 2^(e-1) - 1

Example: 6.5

Binary: 110.1 = 1.101 x 2^2
Sign = 0
Frac = 101
Exp = 2 + 3 (bias) = 5 = 101
Final: 0 101 101

Types of Floating Point Values:

Normalized: 00...0 < exp < 11...1
Special: exp = 111 → INF/NAN
Denormalized: exp = 000

Denormalized Example: 3/16

Decimal: 0.0011 = 1.1 x 2^-3
Bias = 3, Exp = -3 → 0 → denormalized
New definitions:
- E = 1 - bias = -2
- M = 0.11 → Frac = 110
Final: 0 000 110

Special Values:

1 111 000 = -INF
0 111 000 = +INF
0 111 001 = NaN

Floating Point Multiplication:

(-1)^s1 * m1 * 2^e1 × (-1)^s2 * m2 * 2^e2
= (-1)^(s1+s2) * (m1 × m2) * 2^(e1 + e2)

Structs & Memory Alignment

Structs are laid out in memory in declaration order, and alignment rules dictate that data types must start at offsets that are multiples of their size.

Example:

struct c {
  char c;      // offset 0 (1 byte)
  int a[2];    // offset 4 & 8 (each int = 4 bytes)
  long b;      // offset 16 (8 bytes)
};

Total size = 24 bytes (padded to be multiple of 8)

Optimized Layout:

struct c {
  long b;      // offset 0
  int a[2];    // offset 8 & 12
  char c;      // offset 16
};

Padding only at the end

Dynamic Allocation:

Double-word aligned (start on even address)
Header contains size & allocation status

Allocation Strategies:

First fit, next fit, best fit
Free block merging to reduce fragmentation

Garbage Collection:

Memory blocks as nodes, pointers as edges
Unreachable nodes are removed

Exception Handling

Control Flow: Handles unexpected events (interrupts, system calls, faults)

Types of Exceptions:

Interrupts: Asynchronous (e.g., I/O devices)
Traps: System calls (e.g., read, fork, exec)
Faults: Illegal memory access (e.g., segmentation fault)
Aborts: Unrecoverable hardware errors

Microkernel vs Monolithic Kernel:

Microkernel: Minimal OS functionality (e.g., Windows, macOS)
Monolithic: Large access, customizable, risky (e.g., Linux)

Context Switching: Kernel maintains:

Page table, file table, process table, registers, stack

Process Lifecycle:

fork() → Creates child process
exit() → Terminates process
wait() or waitpid() → Parent reaps child

Zombie State: Child terminated but not yet reaped

Signals: Notify process of events

Used for interprocess communication
Propagated to process groups (e.g., pipelines)

sigaction Example:

void handler(int sig) {
    printf("Caught signal %d\n", sig);
}

struct sigaction sa;
sa.sa_handler = handler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
sigaction(SIGINT, &sa, NULL);

Race Condition Prevention: Block signals before forking, unblock afterward

Virtual Memory

Virtual memory gives each process an isolated address space. Pages are swapped in/out of disk.

Page fault = page not in memory → OS brings it in
Page Table maps virtual pages to physical frames
TLB (Translation Lookaside Buffer): Caches page table entries

Translation:

Virtual Address → VPN + Offset
TLB → Index + Offset → PPN

Full Virtual Address Translation Example

Simulates how a virtual address is translated to a physical address, and then used to access the cache, TLB, page table, and finally main memory.

System Configuration

Parameter	Value
Virtual Address Size	16 bits
Physical Address Size	16 bits
Page Size	4 KB = 2¹² bytes
Offset Bits	12 bits
VPN Bits	4 bits
TLB Entries	4
Cache Lines	4
Cache Block Size	1 word (1 line)

Example Virtual Address

We’re translating:

Virtual Address (VA) = 0x1234
Binary: 0001 0010 0011 0100

VPN (Virtual Page Number) = first 4 bits = 0001 = 0x1
Page Offset = last 12 bits = 0x234

Step 1: TLB Lookup

TLB Table (before lookup):

Index	VPN	PPN	Valid
0	0x0	0x3	1
1	0x2	0x4	1
2	0x3	0x6	1
3	0x4	0x2	1

Lookup VPN = 0x1
Not found → TLB Miss

Step 2: Page Table Lookup

Page Table:

VPN (Index)	PPN	Valid
0x0	0x3	1
0x1	0xA	1 ✅
0x2	0x4	1
0x3	0x6	1

VPN = 0x1 → PPN = 0xA
OS loads TLB entry: (VPN 0x1 → PPN 0xA)

Step 3: Construct Physical Address

Combine:

PPN (Physical Page Number) = 0xA
Offset = 0x234

Physical Address (PA) = (PPN << 12) | Offset
                      = (0xA << 12) | 0x234
                      = 0xA000 | 0x234
                      = 0xA234

Step 4: Cache Lookup

Let’s assume a direct-mapped cache with 4 lines:

Cache index bits: 2 bits → 2^2 = 4 lines
Offset: 0 bits (1-word cache block)

Break PA (0xA234):

Binary: 1010 0010 0011 0100
Tag: upper 14 bits = 101000100011 (0xA23)
Index: bits 2–3 = 01 → Line 1
Offset: 0 bits → single word

Cache Table:

Line	Valid	Tag	Data
0	1	0xA20	0xFFFF
1	1	0xA21	0x1234
2	0	–	–
3	1	0xB10	0xF00D

Index = 1, Tag in PA = 0xA23 ≠ Tag in cache = 0xA21 → Cache Miss

Step 5: Access Main Memory

Cache miss → Load data from physical memory at 0xA234
Cache line 1 is updated with tag 0xA23 and value fetched

Summary Flow

Virtual Address = 0x1234
 ├──> TLB Lookup
 │     └── MISS
 │
 ├──> Page Table Lookup
 │     └── VPN 0x1 → PPN 0xA
 │
 ├──> Construct PA = 0xA234
 │
 ├──> Cache Lookup @ Index 1
 │     └── MISS (tag mismatch)
 │
 └──> Load from Physical Memory[0xA234] → Update Cache

cache has the physical addressses stored. —

Assembly Language

Compiled code translates to assembly, which consists of simple operations and direct memory access.

Operand Types:

$ = constant
% = register
0xADDR = memory address

Example:

movq %rax, %rbx    # copy rax to rbx

Registers:

8-bit: %al, %bl, %cl, %dl
16-bit: %ax, %bx, %cx, %dx
32-bit: %eax, %ebx, %ecx, %edx
64-bit: %rax, %rbx, %rcx, %rdx

Jump Example:

start:
  cmp $1, %eax
  je equal
  jmp end

equal:
  mov $1, %ebx
end:

Caller-saved: Functions don’t restore them Callee-saved: Must be preserved by callee

C Function vs Assembly:

int add(int a, int b) {
  return a + b;
}

add:
  mov %rdi, %rax   # copy a to return register
  add %rsi, %rax   # add b
  ret

Memory Locality and Hierarchy

Temporal Locality: Reuse of recently accessed data Spatial Locality: Accessing nearby data (e.g., arrays)

Hierarchy:

Register – Fastest
Cache – L1/L2
Main Memory – DRAM
Disk – Persistent storage

Static RAM: Cache, fast but costly Dynamic RAM: Main memory, dense

Performance Metrics:

Transfer latency
Rotational latency
Seek time

Cache Example

Address broken into segments:

Tag: Uniquely identifies block
Line: Row in cache
Offset: Byte within block

Example Table:

Address: 0x1234AB
Tag:     0x12
Line:    0x34
Offset:  0xAB

Formulas:

2^tag blocks per set
2^line sets
2^offset bytes per block

CS 471 - Operating Systems

Note: Half of OS is a repeat of CS 367 because Computer Engineering students also take this course.

OS Refresher

Von Neumann architecture
Virtualization (Virtual Memory)
Provides system calls as standard library
Resource manager
Multithreading
Volatile vs Non-volatile memory
Abstractions
Provides protections
Isolates processes

Processes

Program: Code stored on disk
Process: When the code is loaded and running

States:

Running
Ready
Blocked
Terminated

Concurrency

Each thread has its own set of registers
Context switch with threads and processes works very similarly
State of each process → moved to Process Control Block (PCB)
State of each thread → moved to Thread Control Block (TCB)
Thread context switches do not usually involve paging (same address space)
Global variables must be protected to ensure mutual exclusion
Race conditions can lead to nondeterministic behavior

Direct Execution

Create entry on process list
Allocate memory for the program
Load program into memory
Set up stack and clear registers
Run main
Return from main
Free memory
Remove from process list

Problem: No time-sharing

Limited Direct Execution

Process runs in user mode (can access program files, registers)
For I/O: switch to kernel mode (has unrestricted access)

Trap Instruction → switch to kernel
Return from Trap → switch back to user mode

Execution Flow:

Create entry on process list
Allocate memory for the program
Load program into memory
Set up stack and clear registers
Run main
System call (trap)
Save registers to stack
Kernel mode
Handle trap
Return to user mode
Restore registers
Return from main
Free memory
Remove from process list

When to Context Switch

Option 1 (Cooperative): Switch when the process makes a system call
- Problem: If no system calls, a program may hog the CPU
Option 2 (Non-cooperative): Timer interrupts trigger a switch

Two Saves:

Timer interrupt → Hardware saves user register state using kernel stack
OS switches process → Kernel register state saved by OS

CPU Scheduling

Assumptions:

Jobs run for a certain time
All arrive at the same time
Once started, runs to completion
No I/O
Runtime is known

Scheduling Algorithms:

FCFS: Simple, but short tasks wait for long ones
Shortest Job First: Optimal if jobs known in advance
Shortest Time to Completion: Preemptive version of SJF, allows shorter tasks to interrupt
Round Robin: Fixed time slice, good for time-sharing but can delay long tasks
With I/O: While one task does I/O, another can run

Real World: OS doesn’t know how long each task takes

Multilevel Feedback Queue (MLFQ)

Multiple priority queues
Jobs move up/down levels
Prevents starvation (but can still happen with many high-priority tasks)

Proportional Share Scheduling:

Based on CPU usage
Ensures all processes get some CPU

Lottery Scheduling:

Tasks receive a number of tickets
Random winner per scheduling round
Prevents starvation using randomness
Example: Task A gets tickets 1-10, Task B gets 11-20

Linux Scheduler:

Completely Fair Scheduler (CFS)
Priority-based
Tracks virtual runtime
Uses Red-Black Tree to manage scheduling efficiently

Multiprocessor Scheduling:

Each CPU has its own cache
Requires hardware support

Memory: Base and Bounds (Dynamic Relocation)

Two registers: Base and Bounds
Physical Address = Virtual Address + Base
Bounds = upper memory limit
Prevents processes from accessing memory outside their segment → segfault
Base and bounds updated on each context switch

Livelock

Two processes constantly changing state without progress
Solution: Interrupts / busy-wait handling

Data Transfer

Programmed I/O: CPU actively manages data transfer
DMA (Direct Memory Access): CPU writes data to memory, DMA controller handles transfer

Disk Performance

Latency = Seek Time + Rotational Latency + Transfer Time

Disk Scheduling:

FIFO: Worst performance
Shortest Positioning Time First (SPTF): Fast, but risks starvation
SCAN: Sweep from one end, services along the way (middle gets priority)
C-SCAN: Sweep one direction only, resets without servicing on return

RAID (Redundant Array of Inexpensive Disks)

Appears as one disk to OS, provides performance, capacity, and reliability.

RAID-0 (Striping):
- No redundancy
- Capacity = N
- No fault tolerance
RAID-1 (Mirroring):
- Store exact copy
- Capacity = N/2
- Fault-tolerant
RAID-4 (Parity):
- One disk holds parity info
- Capacity = N - 1
- Higher latency for random writes
RAID-5 (Rotating Parity):
- Parity spread across disks
- Reduces write latency

CS 475 - Concurrent and Distributed Systems

Overview

Concurrent: Multiple tasks on the same computer, using scheduling to switch between them.
Parallelism: Multiple CPUs running tasks truly simultaneously.
Distributed: Multiple computers sharing a task over a network.

Multithreading

Each thread has its own stack within the overall stack and its own control flow.
Shared memory (like global variables) → risk of race conditions.
Threading is less expensive (doesn’t require its own page table).

Race Condition Example: Thread 1 and Thread 2 both read value 5. Each subtracts 1. Both write 4 back → expected result was 3.

Solution: Use locks to restrict access to shared memory.

Locks and Synchronization

Atomic Instruction: Retrieve and write in one go.
Basic Lock Functions:
- acquire()
- release()

C-style Mutex Locks

Ways to implement locks:

Disable Interrupts:
- Only viable for kernel code.
- Not safe (can be abused).
Peterson’s Algorithm:
- Works for 2 threads.
- Threads alternate.
- Lots of waiting.
Test and Set (Spinlock):
- Spin until the lock becomes available.
- Problem: still has race conditions.
Atomic Test and Set:
- Atomic check + acquire.
- Disable interrupts only when acquiring.
Compare and Swap:
- If lock is 0 → set to 1, else wait.
- Atomic by hardware.

Condition Variables

Used to control the order of execution among threads.

Bounded Buffer Problem:

Producer only adds if space is available.
Consumer only removes if buffer not empty.

Solution:

Use wait() and signal().
Always put wait() in a while loop to avoid spurious wakeups and race conditions.

Semaphores

Work as both locks and condition variables.
Binary or counting semaphores.

sem_t s;
sem_init(&s, 0, 1); // binary semaphore
sem_wait(&s); // acquire
sem_post(&s); // release

Concurrent Data Structures

Concurrent Queue: One lock for the entire queue.
Concurrent Hashtable: One lock per chain (if chaining).
Concurrent Linked List: Two locks for adjacent nodes.

Distributed Systems

Architecture

RPC (Remote Procedure Call)
MOM (Message-Oriented Middleware)

Message Invocation Models

At least once – e.g., SunRPC
Exactly once – like a local function
At most once – e.g., Go/Java

Idempotency

A function that returns the same result even when executed multiple times.

MapReduce

Input/output (I/O) is the bottleneck
Automates parallelization and distribution across machines
Input: (key, value)
map() → intermediate pairs
reduce() → merges all intermediate pairs

Time Synchronization

Ensure that requests happen in correct order

Techniques:

NTP (Network Time Protocol): syncs to UTC via satellite/radio
Christian’s Algorithm: uses round-trip time to sync clocks
Lamport Clocks: happened-before logic using counters
Totally Ordered Multicast:
- Broadcast update
- Receive ACKs
- Apply only when everyone has received

Vector Clocks

Vector of integers (one per process)
Captures causality

Rules:

Increment your own clock
Send your vector with messages
Upon receiving, merge vectors

System Models

Synchronous: Known message delay, global clock
Asynchronous: No timing assumptions

Failure Models

Crash-Fail: Deterministic, node crashes
Partition: Node split/network disconnection
Byzantine: Arbitrary failures (even malicious)

Distributed Mutual Exclusion

Assumptions:

Reliable network
Asynchronous processes
Crash-fail only

Approaches:

Centralized Lock Server:
- All requests go to a single server
- Server grants access
- Problem: single point of failure
Ring-based Algorithm:
- Token is passed
- No central server
- Problem: ring breakage
Shared Priority Queue (Lamport Queues):
- Timestamps
- Floods messages → overhead
Ricart & Agrawala:
- Combines request and release
- Optimizes Lamport’s idea

Distributed Transactions

Transactions must be atomic (all or nothing)
ACID properties

Two-phase locking:

Acquire all locks
Release all locks

Coordinator:

One node starts transaction
Decides to commit/abort

Two-Phase Commit (2PC)

Voting Phase:
- All nodes vote commit or abort
- Once voted commit, cannot revert
Commit Phase:
- Actually apply changes

Failure Handling:

Participant crash → Ask coordinator for result
No response → Default to abort
Coordinator crash → Elect a new one or use previously sent result

Three-Phase Commit (3PC)

Adds a step to improve liveness:

Vote
Pre-commit broadcast
Commit notice

Helps if coordinator crashes → everyone knows the decision.

Problem:

If half receive, half don’t → partition
Majority continues, minority syncs later

CAP Theorem

You can have only two out of:

Consistency
Availability
Partition Tolerance

Partitioning is non-negotiable in distributed systems.

ZooKeeper : Coordinator system used for leader election, configuration, and locking in distributed systems.

CS 483 - Analysis of Algorithms

Note: For this class — just do a bunch of Leetcode to build intuition. Then, learn how to write proofs. That’s it. This is a very applied class; you won’t get much out of just reading theory.

Representative Problems

Interval Scheduling Problem
- Solved using greedy algorithms
Weighted Interval Scheduling
- Similar to interval scheduling but with priorities
- Solved using dynamic programming
Bipartite Matching
- Pairings and maximum matchings in bipartite graphs
Network Flow
- Flow problems on graphs (e.g., Ford-Fulkerson)
NP-Complete Problems
- Hard to solve efficiently
- Easy to verify solutions
- Example: Competitive Facility Location
  - Find optimal spacing among facilities

Complexity Classes

NP-Complete: Solutions are easy to verify but potentially hard to compute
P-Space Complete: Hard to find a solution but easy to check once found

Algorithm Paradigms

Divide and Conquer

Break problem into subproblems
Solve recursively
Combine solutions

Greedy Algorithms

Construct solution step-by-step
Always choose the best decision at the current step
Hope that local optimum → global optimum

Dynamic Programming

Consider all possible solutions, choose the best
Look for repeated subproblems
Two main approaches:
- Memoization: Top-down + recursion + caching
- Tabulation: Bottom-up + iterative + often array-based

When to use DP?

Recursive solutions with repeated calculations
Can transform recursion → memoization → tabulation

Graph Algorithms

BFS (Breadth-First Search)
DFS (Depth-First Search)
Dijkstra’s Algorithm (shortest path)
Topological Sort (for DAGs)
Minimum Spanning Tree (MST)
- Prim’s Algorithm
- Kruskal’s Algorithm

Proof Techniques

Lemma + Corollary + Subproblem + Algorithm format
Inductive Reasoning:
- Base case
- Inductive step (assume n)
- Prove for n+1

CS 480 - Introduction to Artificial Intelligence

A combination of search, logic, learning, and reasoning. Examples and proofs help more than pure theory memorization.

Constraint Satisfaction Problems (CSPs)

Identification problem: We care if a state is a goal state, not how we got there.
Components:
- Variables: Set of objects that will take on values
- Domain: Set of all possible values for each variable
- Constraints: Restrictions on variable values with respect to other variables
Complexity: Usually NP-hard
Typical solver: Backtracking

Arc Consistency:

Represent each undirected edge as two directed arcs
Maintain arcs such that for every value in variable A, there’s a consistent value in variable B
Can be thought of as an adjacency list

k-Consistency:

Extends arc consistency to sets of k variables

Forward Checking:

Eliminate inconsistent values ahead of time to reduce backtracking

Heuristics:

Minimum Remaining Value (MRV): Choose the variable with the fewest legal values
Least Constraining Value (LCV): Choose the value that eliminates the fewest choices for the neighboring variables

Tree-structured CSP:

Can be solved efficiently using topological sort

Local Search for CSPs:

Start from a random assignment
Iteratively reassign values to minimize conflicts
Constant space, but incomplete and suboptimal as the constraint-to-variable ratio increases

Example: Map Coloring

Variables: {WA, NT, SA, Q, NSW, V, T}
Domain: {Red, Green, Blue}
Constraint: Adjacent regions must not share the same color

Example: N-Queens (4x4)

Queens: Q1–Q4 each in different columns
Place Q1 in row 2, Q2 in row 4, Q3 in row 1, Q4 in row 3 → No queen attacks another

Peak Finding

Hill Climbing:

Always move towards increasing value
May get stuck at local maxima
Incomplete

Simulated Annealing:

Occasionally allows downhill moves to escape local maxima
Balances exploration and exploitation

Genetic Algorithm:

Local beam search with crossover and mutation
Explore multiple paths in parallel and favor the most promising

Search Problems

World State: All information about the environment
Search State: Minimal info needed for search algorithm

Uninformed Search

Algorithm	Complete	Optimal	Notes
Depth-First Search	No	No	Can get stuck in loops
Breadth-First Search	Yes	Yes	If all edges have equal cost
Uniform Cost Search	Yes	Yes	Explores lowest cost path first
Iterative Deepening	Yes	Yes	Combines BFS + DFS
Bidirectional Search	Yes	Yes	If both searches use UCS

Informed Search

Algorithm	Complete	Optimal	Notes
Greedy Search	No	No	May get stuck in loops
A* Search	Yes	Yes	Requires admissible and consistent heuristic

Good heuristic: Always underestimates cost to goal

Adversarial Search

Minimax — Step-by-Step Example

            MAX
         /       \
      MIN         MIN
     /   \       /   \
    3     5     2     9

Evaluate the left MIN:
- Children: 3, 5 → MIN = 3
Evaluate the right MIN:
- Children: 2, 9 → MIN = 2
MAX chooses the maximum of 3 and 2 → Best move = Left Subtree → Value = 3

Alpha-Beta Pruning Example (using above tree)

            MAX
         /       \
      MIN         MIN
     /   \       /   \
    3     5     2     9

Alpha = -inf, Beta = inf
Left MIN: 3 (Beta = 3), then 5 → no pruning
Right MIN: 2 (Beta = 2) < Alpha (3) → PRUNE 9

Expectimax

MIN nodes replaced with average of possible outcomes

         MAX
       /     \
    Chance   Chance
    /   \     /   \
   3    5   2     9

Chance 1: (3+5)/2 = 4
Chance 2: (2+9)/2 = 5.5
MAX picks 5.5 → right branch

Sampling Methods in Bayes Nets

Bayes Net Example:

Cloudy → Rain
     ↘        ↓
     Sprinkler → WetGrass

P(Cloudy) = 0.5
P(Rain Cloudy) = 0.8
P(Sprinkler Cloudy) = 0.1
P(WetGrass Rain, Sprinkler) = 0.99

Prior Sampling

Generate samples from root → leaves
Sample: Cloudy = True → Rain = True → Sprinkler = False → WetGrass = True
Repeat for 1000 samples, estimate P(WetGrass=True)

Rejection Sampling

Generate prior samples, discard those not matching evidence (e.g., WetGrass=True)

Likelihood Weighting

Fix evidence (e.g., WetGrass=True)
Weight each sample based on probability of evidence given parents

Gibbs Sampling

Initialize all variables randomly
Repeatedly sample each non-evidence variable conditioned on Markov blanket

CS 455 - Networking

Internet Overview

The Internet is a global computer network interconnecting billions of users.
End systems: Devices like computers and servers.
ISPs (Internet Service Providers): Provide access to the internet.
- Local ISP → Regional ISP → Tier 1 ISP

Packet Transmission

Data sent as packets.
Packets travel through links (fiber optics, coaxial cable, wireless) and packet switches.

Types of Network Delay

Processing Delay: Time to examine packet header and check for errors.
Queueing Delay: Time waiting in buffer before transmission.
Transmission Delay: Time to push bits onto the link.
Propagation Delay: Time to travel across the link.

Throughput = Your share of available bandwidth.

TCP/IP Model

Layer	Protocols	Notes
Application	HTTP, SMTP, FTP, DNS	End-to-end communication
Transport	TCP, UDP	End-to-end delivery. TCP: reliable, UDP: fast
Network	IP	Routing and forwarding between hosts
Link	Ethernet, Wi-Fi	Transmission between nodes
Physical	Coaxial, Fiber Optic	Signal transmission medium

Encapsulation:

[Link Header [Network Header [Transport Header [Application Header [Data]]]]]

Network Security

DoS Attack: Overwhelms target with traffic
Packet Sniffing: Monitors data
IP Spoofing: Fake IP addresses
Mitigation: Endpoint authentication, encryption

Application Layer

Models

Client-Server
Peer-to-Peer (P2P)

Socket

Interface for process-to-process communication

Key Considerations:

Reliability (data guaranteed?)
Throughput (rate of data)
Loss Tolerance (okay to drop packets?)
Timing (real-time required?)
Security (data integrity/authentication?)

DNS (Domain Name System)

Maps domain names to IP addresses
Uses UDP

DNS Hierarchy:

Client → Local DNS Resolver
         |
         ↓
     [Root Server]
         |
         ↓
     [TLD Server]
         |
         ↓
[Authoritative Server]
         |
         ↓
     Final IP Address

Search Types:

Iterative: Resolver walks through each level
Recursive: Resolver asks root, which queries each level

DNS Record Types:

A → Hostname to IP
NS → Domain to name server
CNAME → Alias to canonical
MX → Mail exchange

HTTP

Runs on TCP, stateless (nonpersistent connection)
Cookies: Track sessions
Web Caches: Reduce latency and load

SMTP

Transfers mail over TCP
User Agent → Mail Server → Recipient Server
Display requires POP3, IMAP, or HTTP

Transport Layer

Reliable Data Transfer

Checksum: Detect bit errors
ACK/NACK: Confirm or reject packets
Sequence Numbers: Ensure correct ordering
Timeouts: Retransmit if no ACK received

Pipelining

Go-Back-N (GBN):
- Sliding window
- Cumulative ACK
- One timer per window

Sender Window Size = 4
Packets Sent: [0, 1, 2, 3] → Awaiting ACKs

ACK for 0 received → Slide window to [1, 2, 3, 4]
ACKs for 1 and 2 lost
ACK for 3 received → Discarded (cumulative ACK not reached)

Timeout for Packet 1 → Resend: 1, 2, 3, 4

Selective Repeat:
- Individual ACKs
- Timer per packet

Sender Window Size = 4
Packets Sent: [0, 1, 2, 3]
ACKs received for 0 and 2
Timer expires only for packet 1 → Resend packet 1

Only missing packet is resent → Bandwidth-efficient

UDP

Lightweight, best-effort delivery, optional checksum

TCP

Reliable byte-stream service
Combines Go-Back-N and Selective Repeat
ACK = next expected byte
Fast retransmit: After 3 duplicate ACKs
Timeouts: Exponential weighted moving average

Sequence: [1000, 1001, 1002]
ACKs: ACK=1003 (means packets up to 1002 received)

Fast Retransmit:
Duplicate ACKs for 1001 ×3 → Immediate resend of 1001

TCP Flow Example:

Client                         Server
  | SYN ---------------------> |
  |                           |
  | <------------------ SYN-ACK |
  | ACK ---------------------> |

Congestion Control:

Additive Increase, Multiplicative Decrease
Slow Start: Exponential ramp-up
Congestion Avoidance: Linear increase post-threshold

CWND (Congestion Window Size)
^
|        /\
|       /  \
|      /    \
|_____/      \_______
       Additive     Multiplicative
       Increase       Decrease

Network Layer

IP Model

Best-effort: No delivery guarantees

Planes

Data Plane: Packet forwarding
Control Plane: Path computation (routing)

DHCP (Dynamic Host Configuration Protocol)

Assigns IP addresses dynamically
Runs over UDP

DHCP Message Exchange:

Discovery (Host → 255.255.255.255)
Offer (Server)
Request (Host)
ACK (Server)

NAT (Network Address Translation)

Private IP → Public IP + Port
Reduces global IP usage

VPN

Adds encrypted headers (tunneling)

IPv6 Transition

Uses tunneling to wrap IPv6 in IPv4 headers

Devices

Routing Protocols

Link-State:
- Flood costs to all
- Compute shortest paths (e.g., Dijkstra)
Distance Vector:
- Bellman-Ford algorithm
- Slower convergence

Routing Domains

BGP: Inter-AS (between networks)
OSPF: Intra-AS (within networks)

Forwarding

Uses forwarding tables
Matches IP address prefix to interface

| Prefix        | Interface |
|---------------|-----------|
| 192.168.1.0/24| eth0      |
| 10.0.0.0/8    | eth1      |
| 0.0.0.0/0     | eth2      | ← Default route

Packet Dest: 192.168.1.34 → eth0

Link Layer

Transmits between adjacent nodes using MAC addresses

ARP (Address Resolution Protocol)

IP → MAC address
Uses broadcast to discover addresses

Channel Access Methods

Protocol	Description
FDMA	Divide channel by frequency
TDMA	Divide channel by time
CDMA	Assign unique codes to users
Slotted ALOHA	Time slots; retry on collision
Pure ALOHA	No time slots; even higher collision rate
CSMA	Sense channel before sending
CSMA/CD	Detect collision and stop transmission

FDMA

User 1 → 0–1 MHz
User 2 → 1–2 MHz
User 3 → 2–3 MHz

Each gets a frequency band.

TDMA

Time Slot Assignment:
Slot 1: User A
Slot 2: User B
Slot 3: User C

Users transmit in their time window only.

CDMA

User 1: Code 1001
User 2: Code 1100

Both send at once, decoded at receiver using code correlation.

ALOHA

User 1 sends packet → Collision with User 2
Wait random time, retransmit

Pure ALOHA: Random anytime
Slotted ALOHA: Only at slot boundaries

CSMA/CD Example

User 1 senses channel → Idle → Transmit
Collision occurs with User 2
Both stop immediately
Wait random time → Retry

Switches vs Routers

Device	Layer	Purpose
Switch	Link	Forward & flood frames
Router	Network	Forward packets