class notes
Crash Course GMU
A compilation of all the notes from all my CS classes at GMU – the ultimate revision guide.
Why did I take the time to compile this? By my final year of college, everything felt like a blur, and it seemed like I had learned nothing. Of course, after taking so many classes, I had definitely gained knowledge. However, I wanted to do one grand revision of everything I had learned, from the beginning to the very end.
My notes are neither the most organized nor fully comprehensive and should not be used as a cram study guide for final exams (you will undoubtedly fail). They are meant to offer a somewhat detailed summary of what you might learn over four years, along with random facts I found interesting. You’ll notice the quality of notes declines as the semester progresses, with a steep drop after midterm season. Alas, such is the life of a student.
Happy studying!
All Classes
Links to jump to any class you need help with:
CS 110 - Essentials of Computer Science
The history of computer science spans several generations, with different tools and engineers marking each period:
- Vacuum Tubes – Systems programmers wrote programs to convert system languages to machine languages.
- Transistors – Application programmers wrote instructions for computers to execute operations.
- Circuit Boards & Operating Systems – Non-programmers could start using computers for tasks.
- Large-Scale Integration (Silicon Chips) – Enabled word processing and more advanced software.
- Interconnected CPUs & GPUs – Widespread computer usage and the rise of the internet.
Notable Figures:
- Ada Lovelace – Often considered the first female programmer. Her work built upon Babbage’s Analytical Engine.
- Charles Babbage – Designed the Analytical Engine during the Industrial Revolution to automate calculations. Inspired by the Jacquard loom’s punch cards, which allowed for repeated patterns.
- Alan Turing – Father of theoretical computer science. Developed the Bombe to crack Enigma codes during WWII. His Turing Machine and Turing Test remain foundational in AI research.
Key Concepts:
- Binary (Base 2): 0-1 (1 bit)
- Octal (Base 8): 0-7 (3 bits)
- Decimal (Base 10): 0-9
- Hexadecimal (Base 16): 0-F (4 bits)
Data Compression
Reducing storage space needed for data.
- Lossless Compression – No data is lost during compression.
- Compression Ratio – How much a file is reduced in size.
- Bandwidth – The amount of data that can be transferred at a time.
Types of Encoding:
- Keyword Encoding – Replaces common words with shorter representations (e.g., “and” → “&”).
- Run-Length Encoding – Compresses repeated characters (e.g.,
1111110001111
→613041
). - Huffman Encoding – More frequent characters get shorter bit representations.
Digital signals are preferred for data transfer because they are resistant to environmental interference.
Computer Architecture
- Virtual Machines (VMs): Emulations of physical computers stored as files (“images”).
- Hypervisor: Software that creates and manages VMs.
- Cloud Computing: Remote data storage and computing services with concerns over security and third-party reliance.
- Heap Memory: Used by all parts of a program and dynamically allocated.
- Stack Memory: Used for local variables and function calls; prone to overflow.
Storage & Hardware
- Solid State Drives (SSD): Fast, durable, non-mechanical.
- Hard Disk Drives (HDD): Higher storage capacity, mechanical, cheaper but more fragile.
- CPU (Central Processing Unit): Executes programs, includes ALU (Arithmetic Logic Unit) and processors.
CS 211 - Object-Oriented Programming
Compilation Process
-
.java
files are converted to.class
files by the Java compiler. - Java does not have an assembler and only requires a compiler to convert
.java
files into bytecode. - The Java Virtual Machine (JVM) generates machine code at runtime, acting as an interpreter.
- Advantage: Since Java includes its own runtime environment, it can be executed on any device. This was likely one of the primary reasons Java was created.
Tokens in Java
Java has six types of tokens:
- Reserved Keywords –
if
,else
,int
,for
,while
- Identifiers –
x
,i
,objName
,varName
- Literals –
7
,"hello"
,8.9
- Operators –
+
,-
,/
,*
- Separators –
[]
,{}
,()
- Terminators –
;
Example:
int count = 7; // reserved, identifier, operator, literal, terminator
Increment Operators
int x = 9;
-
x++
→ Evaluatesx
, then increments it.-
print(x++)
→9
-
print(x)
→10
-
-
++x
→ Incrementsx
, then evaluates it.-
print(++x)
→10
-
print(x)
→10
-
Variable Types in Java
- Class Variables: Shared across all objects of the class. Changing one instance affects all others. Declared using the
static
keyword. - Instance Variables: Each object has its own copy. Modifying one object’s variable does not affect others.
- Local Variables: Exist only within a method. Once the method returns, the variable’s scope ends.
Primitive Data Types & Their Sizes
Integers
-
byte
- 1 byte -
short
- 2 bytes -
int
- 4 bytes -
long
- 8 bytes
Floating-Point Numbers
-
float
- 4 bytes -
double
- 8 bytes
Characters
-
char
- 1 byte
Booleans
-
boolean
- 1 byte (size may vary)
Reference Data Types
- Any non-primitive type or dynamically created object.
- Includes:
- Data structures (e.g., stacks, queues)
- Objects of a class
Autoboxing & Unboxing
- Autoboxing: Converting a primitive to a reference type.
- Unboxing: Converting a reference type to a primitive.
Type Conversion & Typecasting
Implicit Conversion
- Converts to a larger data type (no data loss).
int x = 4; long y = x;
Explicit Conversion (Typecasting)
- Required when converting to a smaller type.
long x = 4; int y = (int) x;
Object References & Comparisons
Demo d = new Demo(); // Creates a new object
Demo d1 = d; // Assigns d1 to reference of d
-
d
andd1
refer to the same object. -
d == d1
→true
Jagged Arrays
int[][] array = new int[2][];
array[0] = new int[3];
array[1] = new int[4];
Strings in Java
Creating Strings
String s = "hey";
String t = new String("hi");
-
s == t
returnsfalse
because their references are being compared.
Correct String Comparison
s.compareTo(t) == 0
String Immutability
- Strings in Java are immutable.
- Operations like
+=
or" " + " "
create a new string object. - This can cause performance issues in large applications.
- Use
StringBuilder
for efficient string manipulation.
Java I/O
System.in
System.out
System.err
Object-Oriented Programming (OOP) Concepts
Classes & Methods
- Class: A collection of fields (data) and methods (functions).
-
this
Reference:- Refers to the actual instance variables of a class.
- Useful in constructors to differentiate between instance variables and input arguments.
Cascading Constructors
class Person {
Person() {
this("hi", 2); // Calls the constructor with two parameters
}
Person(String name, int age) {
this.name = name;
this.age = age;
}
}
Instance vs. Static Methods
- Instance Methods → Require an object to be called.
- Static Methods → Cannot access instance variables, but can access static variables.
Method Overloading & Overriding
- Overloading: Same method name, different parameters.
- Overriding: Same method name, same parameters.
Encapsulation: Access Modifiers
| Modifier | Access Level | |————|————-| | public
| Accessible from any class | | private
| Accessible only within the same class | | protected
| Accessible within the same package and subclasses | | (default) | Accessible within the same package |
Inheritance
- Defines relationships between parent and child classes (
is-a
relationship). -
final
class → Cannot be extended. -
abstract
class → Must be extended.
Polymorphism
- Static Polymorphism → Method overloading.
- Dynamic Polymorphism → Method overriding.
STAT 344 Probability and Statistics
Statistics measures variability. Probability measures uncertainty. No. I have no idea what that means.
Types of Models:
- Mechanistic → Based on theory/principles (e.g., gas laws)
- Empirical → Applied science and math
- Regressional → Builds on empirical models and offers predictions
- Probabilistic → Attempts to quantify model errors
Fundamental Concepts:
- Random Experiment: Same input leads to different outputs.
- Sample Space: Set of all possible outcomes.
- Discrete Sample Spaces → Finite, quantifiable outcomes.
- Continuous Sample Spaces → Infinite outcomes (e.g., number line).
- Event: A subset of the sample space, typically denoted with a capital letter.
Axiomatic Probability:
- Axiom/Postulate: A statement assumed to be true.
- Axiomatic System Features:
- Consistent → No contradictions.
- Independent → Axioms cannot be derived from each other.
- Complete → Everything can be derived from axioms.
Conditional Probability:
[ P(D|F) = \frac{P(A \cap B)}{P(B)} ]
Independence of Events:
-
( P(A B) = P(A) ) - ( P(A \cap B) = P(A) \times P(B) )
Bayes’ Theorem
(Came back during NLP Sentiment Analysis)
[ P(A|B) = \frac{P(B|A) \times P(A)}{P(B)} ]
Distribution Functions
Probability distribution → Probabilities associated with specific values of ( x ).
Probability Mass Function (PMF)
Example:
- ( f(x_i) > 0 ) for discrete values
- Cumulative Distribution Function (CDF) → Integration of PMF
Properties of the CDF
- Describes a probabilistic function.
- Non-decreasing.
- Defined for all real numbers.
- As ( x \to +\infty ), ( P = 1 ); as ( x \to -\infty ), ( P = 0 ).
Example Discrete CDF:
( P(X = x) = \frac{1}{6}, ) for ( x \in {1, 2, 3, 4, 5, 6} ).
Piecewise function:
[ F(x) = \begin{cases} 0, & x < 1
1/6, & 1 \leq x < 2
1/3, & 2 \leq x < 3
1/2, & 3 \leq x < 4
2/3, & 4 \leq x < 5
5/6, & 5 \leq x < 6
1, & x \geq 6 \end{cases} ]
Special Discrete Distribution Functions:
- Discrete Uniform Distribution
- Description: Equally likely outcomes over integer range ([a, b]).
- Expectation: ( E(X) = \frac{a + b}{2} )
- Variance: ( \frac{(b-a+1)^2 - 1}{12} )
- Example: Fair die roll ((a=1, b=6), E(X) = 3.5, Var(X) \approx 2.92 ).
- Binomial Distribution (Extension of Bernoulli trials with defined trials)
- Description: Number of successes in ( n ) independent trials.
- PMF: ( P(X=k) = C(n,k) p^k (1-p)^{n-k} )
- Expectation: ( np )
- Variance: ( np(1-p) )
- Example: 10 coin flips ((p=0.5), E(Heads) = 5, Var(X) = 2.5 ).
- Geometric Distribution
- Description: Trials until first success.
- PMF: ( P(X=k) = (1-p)^{k-1} p )
- Expectation: ( 1/p )
- Variance: ( (1-p)/p^2 )
- Example: Drawing cards until an ace appears ((p=4/52), E(Trials) = 13, Var(X) = 156).
- Negative Binomial Distribution
- Description: Trials until ( r ) successes.
- PMF: ( P(X=k) = C(k-1, r-1) p^r (1-p)^{k-r} )
- Expectation: ( r/p )
- Variance: ( r(1-p)/p^2 )
- Example: Sales calls until 3 sales ((p=0.2), E(Calls) = 15, Var(X) = 60).
- Hypergeometric Distribution
- Description: Successes in draws without replacement (e.g., card games).
- PMF: ( P(X=k) = \frac{C(K,k) C(N-K,n-k)}{C(N,n)} )
- Expectation: ( n \times (K/N) )
- Variance: ( n \times (K/N) \times (1-K/N) \times \frac{N-n}{N-1} )
- Example: 7-card poker hand with 2 aces, ( E(Aces) \approx 0.54 ).
- Poisson Distribution
- Description: Events in a fixed interval with known rate.
- PMF: ( P(X=k) = \frac{\lambda^k e^{-\lambda}}{k!} )
- Expectation: ( \lambda )
- Variance: ( \lambda )
- Example: Call center receives 5 calls/hour, ( E(Calls \text{ in 2 hrs}) = 10, Var(X) = 10 ).
Continuous Probability Distributions
- Continuous Uniform Distribution
- Description: Similar to discrete uniform, but for continuous values.
- PDF: ( f(x) = \frac{1}{b-a}, \quad a \leq x \leq b )
- Expectation: ( (a+b)/2 )
- Variance: ( (b-a)^2 / 12 )
- Normal (Gaussian) Distribution
- Description: A symmetric distribution centered around the mean.
- PDF: ( f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-(x - \mu)^2 / (2\sigma^2)} )
- Expectation: ( \mu )
- Variance: ( \sigma^2 )
- Exponential Distribution
- Description: Time between events in a Poisson process.
- PDF: ( f(x) = \lambda e^{-\lambda x}, \quad x \geq 0 )
- Expectation: ( 1/\lambda )
- Variance: ( 1/\lambda^2 )
- Gamma Distribution
- Description: Generalization of exponential; models time until the ( k )-th event in a Poisson process.
- PDF: ( f(x; k, \lambda) = \frac{\lambda^k x^{k-1} e^{-\lambda x}}{\Gamma(k)}, \quad x \geq 0 )
- Expectation: ( k/\lambda )
- Variance: ( k/\lambda^2 )
CS 262 - Low-Level Programming in C
PCAL: The Compilation Process
You will likely be quizzed and tested on the PCAL process:
- Preprocessing: Removes whitespace, comments, expands macros, and includes header files.
- Compiling: Converts C source files into assembly code.
- Assembling: Converts assembly into machine (binary) code.
- Linking: Links object files together into an executable binary.
Use the gcc
compiler to compile C files:
gcc hello.c -o hello # Creates an executable named 'hello'
Basic Program Structure
All C programs begin execution from the main
function:
int main() {
return 0;
}
To print to the console, include the stdio
library:
#include <stdio.h>
int main() {
printf("hello");
return 0;
}
Constants and Macros
Use #define
to define constants and macros:
#define GREETING "hello"
#include <stdio.h>
int main() {
printf(GREETING);
return 0;
}
Reading Input
Use scanf()
to read user input:
-
%d
– Decimal integer -
%f
– Float -
%e
– Exponential -
%g
– Compact exponential/float
scanf("%d", &x); // Reads integer into x
scanf("%d,%f", &x, &y); // Reads integer and float
Data Type Conversions
-
int + float
results in afloat
-
int / int
truncates the decimal (integer division) -
int i = 33.4f;
will truncate to33
Comparison chaining like i < j < k
does not work as expected.
Booleans
C uses integers (0
= false, non-zero = true). Use stdbool.h
for true
and false
keywords, but honestly, you can just use 0’s and 1’s and it works the same.
Function Parameters
- Pass by value: Copies variable; original is unchanged.
- Pass by reference: Pass pointer to variable; original can be modified.
C defaults to pass-by-value. Use pointers to simulate pass-by-reference.
Variable Scope
-
extern
: Declares a global variable defined elsewhere. -
static
: Keeps variable persistent across function calls.
Arrays
int array[10] = {0}; // All elements set to 0
int array[10] = {1, 2, 3}; // Remaining values default to 0
int array[10] = {[2] = 3, [7] = 4}; // Sparse initialization
int length = (int)(sizeof(array) / sizeof(int));
Function Declaration
<return-type> functionName(<parameters>) {
// function body
return <value>; // Default return type is int
}
Arrays cannot be returned directly, but can be passed as parameters.
Exit
#include <stdlib.h>
exit(0); // Terminates the program
Pointers
POINTERS — honestly one of the most important concepts in C.
Pointers store memory addresses:
int a = 5;
int *p = &a; // p holds address of a
*p = 10; // modifies value of a to 10
-
const int *p
: Cannot modify value pointed to -
int *const p
: Cannot change address stored
Pointers & Arrays
Pointers and arrays are closely related, primarily due to memory’s structure as one big array.
You can use pointers instead of array indexing to retrieve a value at each index:
int a[10];
int *p = &a[2]; // Points to index 2
int *q = p + 3; // Points to index 5
Multidimensional arrays increment column-first when pointer is used externally.
int a[5][5]; // a → a[0]
Strings
String literals are char pointers:
char *s = "abc";
Use <string.h>
for string functions:
strlen(str)
strcat(str1, str2)
Multifile Compilation
#include <filename> // System headers
#include "filename" // User-defined files
Gives your file access to functions and variables from other files. Creates a class-like feel to your code.
Preprocessor Directives
Preprocessors exist for:
- File inclusions (see above)
- Macro definitions:
#define SPEED_OF_LIGHT 2.9 #define IS_EVEN(n) (n % 2 == 0) #define HI(...) (printf(#_VA_ARGS_))
- Conditional compilation:
#ifdef CROSS_COMPILE // only compiled for BeagleBone #endif
File I/O
FILE *fp = fopen("in.dat", "r");
fclose(fp);
Modes:
-
r
– read -
w
– write (overwrites) -
a
– append
Functions:
-
fread()
,fwrite()
-
fseek()
,SEEK_SET
,SEEK_END
,SEEK_CUR
Dynamic Memory
Another really important concept in 262!!
You can claim memory from the heap directly in C:
-
malloc(size)
– Allocates memory block -
calloc(n, size)
– Allocates and clears memory -
realloc(ptr, new_size)
– Resizes memory -
free(ptr)
– Frees memory to avoid leaks
C has no garbage collector — you must manually free what you allocate.
Structs
Group variables under one name:
struct Person {
char name[50];
int age;
float salary;
};
struct Person p;
p.age = 5;
strcpy(p.name, "Bob");
With typedef
:
typedef struct {
char name[50];
int age;
float height;
} Person;
Person p1;
p1.age = 30;
Structs can be passed to and returned from functions. You can also have nested structs.
Unions
Similar to structs but all members share memory:
union {
int i;
double d;
} u;
Saves memory, but changing one affects all others. Don’t use often — makes code hard to read.
Enums
enum Suit { CLUBS, SPADES };
Just like Java enums. Assigns integer values to names.
Linked Lists
You’ll probably implement this in like 3-4 classes (262, 367, 471, etc) so honestly implementing a linked list becomes second nature
struct Node {
int data;
struct Node* next;
};
struct Node* newNode = (struct Node*)malloc(sizeof(struct Node));
newNode->data = value;
newNode->next = NULL;
Always use pointers in linked lists because the memory is dynamically allocated.
Bitwise Operations
Operate directly on bits:
int i = 4; // 0100
i << 1; // 1000 → 8
i >> 2; // 0001 → 1
& AND
| OR
^ XOR
~ NOT
CS 310 - Data Structures
Time vs Space Complexity
- Time complexity – How long it takes for an algorithm to run.
- Space complexity – How much space is occupied by the algorithm
Trade-offs
- When is it better to sacrifice time or storage?
- Typically depends on readability, long-term use case, and how important latency is to the consumer
Dynamic Arrays
- Arrays without a fixed size; you can add any number of elements.
- Creation: Since arrays are of fixed size in low-level languages, to implement a dynamic array you have to:
- Start with an initial size (user-defined or arbitrary)
- As more values are added and you run out of space, create a new array with double the size
- Shift all elements from the old array into the new one (an expensive operation, but not frequent)
- Deletion:
- To delete a particular index, remove the element and shift everything after it to the left
- If the array becomes less than 1/3 full, you can consider shrinking (halving) the array to save space
Linked Lists
- A data structure in which objects which are not sequentially stored in memory can be accessed by creating a node that stores a pointer/reference to the next object.
- Helpful for memory because now consecutive chunks of memory are no longer required, making it easier to create and delete objects dynamically.
Creation
- In Java, you can create objects for each item in the list.
- In C/C++, use structs.
- Typically each node has 2-3 fields: one for the value, one or two for the next/previous pointers.
Types
- Singly Linked List: Only pointer to the next node; supports one-way traversal.
- Doubly Linked List: Pointers to both next and previous nodes; easier to traverse and modify.
- Circular Linked List: Tail points back to the head; forms a loop.
Stacks and Queues
Stack (LIFO)
- First in, last out.
Queue (FIFO)
- First in, first out.
Implementation
- Can be implemented with arrays or linked lists.
- Linked list implementation is often better for queues because it avoids shifting.
- Array implementation typically uses two pointers (start and end).
- When the end pointer gets to the end of the array, and the start pointer is not at index 0, it wraps around (circular buffer).
- If the queue needs to be expanded, the size is doubled. Values are re-added starting from the start pointer to the end of the original array, and then from index 0 to the end pointer.
Priority Queue
- Special case where the highest (or lowest) priority element is removed first.
- Typically implemented with a min or max heap.
- Order is based on value, not insertion time.
Hashmaps / Hash Tables
- Solve the problem of search time.
- For other data structures, lookup is O(n); with hashmaps, it’s O(1) on average.
Implementation
- Hash tables are implemented as arrays under the hood.
- A hash function maps a key to an index:
index = value % len_array # simple but not great hash function
- The value is stored at that index.
Collision Handling
What if two values map to the same index?
- Separate Chaining:
- The array stores pointer objects (like linked lists).
- Multiple values that hash to the same index are added to the list.
- Downside: slower lookups due to list traversal and potential for all values to cluster in one index.
- A fix: track the longest list; if it exceeds a threshold, rehash with a new function or bigger array.
- Linear Probing:
- If index is taken, look at the next available index (index + 1, +2, …).
- Pros: avoids wasting space; Cons: clustering and longer probe times.
- Quadratic Probing:
- Similar to linear probing, but the index jump increases quadratically:
index + n^2
- Similar to linear probing, but the index jump increases quadratically:
- Double Hashing:
- Apply another hash function if a collision occurs, and retry until an empty spot is found.
C++ uses a combination of separate chaining with open addressing (linear/quadratic probing).
Eventually, as the table gets filled up, a new hash function is needed and the array has to be extended.
Graphs
- Graphs are made up of vertices (nodes) and edges (arcs).
- Edges are defined by vertex pairs.
- If the pairs are ordered (e.g., [1,2] ≠ [2,1]), the graph is directed (directional edges).
Terminology
- Path: Sequence of vertices with a valid edge between each pair.
- Simple Path: No repeated vertices.
- Cyclic: Start and end vertex are the same.
- Weighted Path: Edges have associated “costs”.
Representations
Adjacency Matrix
- nxn matrix where n is the number of vertices.
- Rows = starting vertex, columns = ending vertex.
- Best for dense graphs but uses a lot of space.
| | A | B | C | D | E | |-----|---|---|---|---|---| | A | 0 | 1 | 1 | 1 | 0 | | B | 1 | 0 | 0 | 0 | 1 | | C | 1 | 0 | 0 | 0 | 1 | | D | 1 | 0 | 0 | 0 | 0 | | E | 0 | 1 | 1 | 0 | 0 |
Adjacency List
- Each index represents a vertex.
- Values are a list of vertices it’s connected to.
- Great for sparse graphs (which is often the case).
A: B, C, D B: A, E C: A, E D: A E: B, C
Graph Traversals
- BFS: Queue system; level-order traversal.
- DFS: Stack or recursion; post-order style traversal.
Minimum Spanning Tree (MST)
A tree that connects all vertices with the minimum number of edges.
- Cannot contain cycles or be disconnected.
- If using weights, it should have the lowest total weight.
- A graph can have multiple MSTs; if all weights are unique, there is only one.
Prim’s Algorithm
- Eliminate self-loops and duplicate edges (keep the least weight).
- Pick a root node.
- Pick the least weight edge from the vertex to any unvisited vertex.
- Continue picking the minimum edge that connects to the growing tree.
- Repeat until all vertices are visited.
Kruskal’s Algorithm
- Eliminate self-loops and duplicates.
- Sort all edges in increasing order.
- Keep connecting edges (in order) as long as they don’t form a cycle.
- Stop once MST is formed.
Topological Sort
- Can only be performed on a Directed Acyclic Graph (DAG).
- Common example: course scheduling.
- For each vertex, track number of inbound edges (in-degree).
- Find all vertices with in-degree 0; add them to result list.
- Decrease in-degrees of connected vertices.
- Repeat until all vertices are processed.
Heaps (Min/Max)
- Can be implemented as trees.
- Pain to insert/delete because of constant balancing.
- Min heap: smallest element at the top.
- Max heap: largest element at the top.
- Used in priority queues.
- Honestly, you don’t need to know much detail about this.
Trees
- Trees represent hierarchical structures.
- Binary Tree: Each node has ≤ 2 outbound edges and 1 inbound edge.
- Leaf nodes have no children.
- Root node has no parent.
- Usually implemented as linked lists with left/right child pointers. Can also be implemented using arrays.
Types of Binary Trees
Full/Proper Binary Tree
Each node has either 0 or 2 children.
A
/ \
B C
/ \
D E
Complete Binary Tree
All levels are fully filled except the last, which is filled left to right.
A
/ \
B C
/ \ /
D E F
Perfect Binary Tree
Both full and complete.
A
/ \
B C
/ \ / \
D E F G
Degenerate Tree
All internal nodes have one child.
A
\
B
\
C
/
D
Tree Traversals (using linked list nodes)
Inorder: left, root, right
def inorder(node):
if node is None:
return
inorder(node.left)
print(node.val)
inorder(node.right)
Postorder: left, right, root
def postorder(node):
if node is None:
return
postorder(node.left)
postorder(node.right)
print(node.val)
Preorder: root, left, right
def preorder(node):
if node is None:
return
print(node.val)
preorder(node.left)
preorder(node.right)
Constructing a Tree from Array Representation
Given Preorder + Inorder → build from the left
Preorder: [A, B, D, E, C, F, G]
Inorder: [D, B, E, A, F, C, G]
A
/ \
B C
/ \ / \
D E F G
Given Postorder + Inorder → build from the right
Postorder: [D, E, B, F, G, C, A]
Inorder: [D, B, E, A, F, C, G]
A
/ \
B C
/ \ / \
D E F G
Other Tree Types (Rarely Tested in Detail)
- Binary Search Tree (BST): All nodes to the left of the root are less than the root.
- AVL Tree: A BST that can self-balance when one side becomes deeper.
- Red-Black Tree: A more complex self-balancing BST. (You might have 5 pages of notes on this – likely won’t use them.)
- B Tree: Balanced m-way tree. Binary trees have 2 children; B trees can have many. Each node can hold multiple values.
- B+ Tree: Like a B Tree, but all data is in the leaf nodes only.
Here’s a table comparing all the data structures.
Data Structure | Access | Insert | Delete | Search | Notes |
---|---|---|---|---|---|
Array | O(1) | O(n) | O(n) | O(n) | Fixed size (in C), insert/delete requires shifting |
Dynamic Array | O(1) | O(1)* | O(n) | O(n) | *Amortized insert at end, resizing occasionally |
Singly Linked List | O(n) | O(1)** | O(1)** | O(n) | **O(1) at head, O(n) at tail or specific index |
Doubly Linked List | O(n) | O(1)** | O(1)** | O(n) | Better for bidirectional traversal |
Stack (Array/LL) | O(n) | O(1) | O(1) | O(n) | LIFO: push/pop at top |
Queue (Array/LL) | O(n) | O(1) | O(1) | O(n) | FIFO: enqueue at rear, dequeue at front |
Deque | O(1) | O(1) | O(1) | O(n) | Double-ended queue, insert/delete both ends |
Hash Table | O(1) | O(1) | O(1) | O(1) | O(n) worst-case due to collisions, requires hashing |
Binary Search Tree | O(log n) | O(log n) | O(log n) | O(log n) | Must be balanced (e.g., AVL, Red-Black) |
AVL Tree | O(log n) | O(log n) | O(log n) | O(log n) | Self-balancing BST |
Heap (Min/Max) | O(n) | O(log n) | O(log n) | O(n) | Efficient for priority queue, top is min/max |
Trie (Prefix Tree) | O(k) | O(k) | O(k) | O(k) | k = length of string; used for autocomplete, dictionaries |
Graph (Adj List) | O(1) | O(1) | O(1) | O(V + E) | Good for sparse graphs |
Graph (Adj Matrix) | O(1) | O(1) | O(1) | O(V²) | Good for dense graphs |
CS 330 - Formal Methods and Models
Propositional Logic
Propositions are statements that are either true or false (but not both).
- Examples:
- “It is raining.” (True/False)
- “3 + 4 = 7” (True)
If a
and b
are propositions, then:
-
a ∨ b
(logical OR) is also a proposition.
Logical Equivalence: Two propositions are logically equivalent if their truth tables are the same.
Tautology: A proposition that is always true. Contradiction: A proposition that is always false.
Advanced Example: Show that ((p ∧ q) → r) ≡ (¬r → ¬(p ∧ q))
- Use contrapositive:
p → q
≡¬q → ¬p
- Hence, equivalent by contrapositive.
Set Theory
- A set is a collection of distinct elements.
- Extensional:
{2, 3, 4}
- Intensional:
{x | x < 10}
- Extensional:
Notation:
-
∅
= Empty set -
x ∈ S
= x is an element of set S -
R ⊆ S
= R is a subset of S (may be equal) -
T ⊂ S
= T is a proper subset (T ≠ S) -
|S|
= Cardinality (number of elements) -
2^S
= Powerset (set of all subsets of S)
Advanced Example: Let A = {1, 2}
and B = {x | x < 4}
. Then 2^A = {∅, {1}, {2}, {1, 2}}
, and the intersection of A ∪ B
= {1, 2, 3}
Functions:
- Total function: Every input has a mapped output.
- Partial function: Some inputs may not have outputs.
Logical Operators
-
¬
or~
→ NOT -
∧
→ AND -
∨
→ OR -
→
→ Implication (if…then) -
↔
→ Bi-conditional (if and only if)
Important Logical Laws:
- Contrapositive:
p → q
is equivalent to¬q → ¬p
- Conditional Law:
p → q
≡¬p ∨ q
- Distributive:
p ∧ (q ∨ r)
≡(p ∧ q) ∨ (p ∧ r)
- Subsumption:
p ∧ (p ∨ q)
≡p
- De Morgan’s Laws:
-
¬(p ∧ q)
≡¬p ∨ ¬q
-
¬(p ∨ q)
≡¬p ∧ ¬q
-
Truth Table Example
| p | q | p → q | ¬q | ¬q → ¬p | |—|—|——–|—-|———-| | T | T | T | F | T | | T | F | F | T | F | | F | T | T | F | T | | F | F | T | T | T |
Demonstrates equivalence of implication and contrapositive.
Proofs and Reasoning
- A proof is a sequence of logically valid steps to demonstrate the truth of a statement.
- Theorem: Proven statement
- Axiom: Accepted as true without proof
- Inference rules: Methods to derive conclusions
- Assumptions: Temporary truths to explore implications (shown in square brackets:
[p]
)
Common Rules:
- Modus Ponens: From
p
andp → q
, concludeq
- Modus Tollens: From
p → q
and¬q
, conclude¬p
Example Proof:
[p → q] (Assumption)
[p] (Assumption)
[q] (From 1, 2 by Modus Ponens)
Advanced Example: Prove: ((p → q) ∧ (q → r)) → (p → r)
- Assume
(p → q)
and(q → r)
- Assume
p
- From
p → q
andp
, concludeq
- From
q → r
andq
, concluder
- Therefore,
p → r
Predicate Logic
- A predicate is a function whose output is either
true
orfalse
.-
n > 3
is a predicate. -
5 > 3
is a proposition (concrete value).
-
Truth Set: The set of all values that make a predicate true.
Quantifiers:
- Universal Quantifier:
∀x P(x)
— for all x, P(x) holds - Existential Quantifier:
∃x P(x)
— there exists an x such that P(x)
Negation:
-
¬∃x P(x)
≡∀x ¬P(x)
-
¬∀x P(x)
≡∃x ¬P(x)
Example:
∀x (x > 0 → x^2 > 0)
-
∃x (x^2 = 4)
⇒ x could be 2 or -2
Sequences and Program Verification
- A sequence is an ordered collection of elements (repeats allowed).
Program Verification (Partial Correctness):
- Prove that the output is correct if the program terminates.
- Hoare Triple:
{P} C {Q}
where:-
P
is the precondition -
C
is the code/command -
Q
is the postcondition
-
Example:
{even(x)} x = x + 3; {¬even(x)}
Advanced Example:
{x = n ∧ n ≥ 0}
sum = 0;
i = 1;
while (i ≤ n) {
sum = sum + i;
i++;
}
{sum = n(n+1)/2}
Normal Forms
- Disjunctive Normal Form (DNF): OR of ANDs
(p ∧ q) ∨ (¬p ∧ r)
- Conjunctive Normal Form (CNF): AND of ORs
(p ∨ q) ∧ (¬p ∨ r)
Formal Languages
- Language: A set of strings
- Alphabet (Σ): Set of symbols (e.g.,
{a, b}
) - Σ*: Set of all possible strings using the alphabet
-
L ⊆ Σ*
: Language is a subset of all strings
Examples:
- Alphabet:
{a, b}
- String:
abba
- Language:
{a, ab, abb}
- Empty string:
ε
(also a valid string in many languages)
Advanced Example:
- Language: All strings over
{a, b}
where the number of a’s is even and total length is divisible by 3
Regular Languages and Regex
Regular Languages are defined by finite state automata and closed under union, concatenation, and Kleene star.
Examples:
- Strings ending in “ab”
- Strings with even number of 0s: regex =
(00)*
Regex Examples:
-
(a|b)*
: All strings of a’s and b’s -
a*b+
: One or more b’s preceded by zero or more a’s
Advanced Regex:
- Strings that contain “aba” as a substring:
.*aba.*
Regular Grammars
Productions:
-
X → aX
,X → a
,X → ε
Example (basic): Strings ending in “a”:
S → aS | bS | a
Advanced Example: Strings with at least one a
followed by one b
:
S → aA
A → bB | aA
B → aB | bB | ε
Finite Automata (DFA)
DFA Example: Strings ending in “ab”
- States:
q0 (start)
,q1
,q2 (accept)
- Transitions:
q0 --a--> q1
q1 --b--> q2
Example: Accept strings with even number of a’s and odd number of b’s
- Need 4 states to track parity of
a
andb
NFA (Non-deterministic Finite Automata)
NFA Example: Accept strings with a substring “aba”
- Transitions:
q0 --a--> q1
q1 --b--> q2
q2 --a--> q3 (accept)
- Also include
q0
transitions to self on any character to allow skipping
Pushdown Automata (PDA)
Used for languages like balanced parentheses:
PDA Example: Recognize L = { a^n b^n | n ≥ 0 }
- Push an
a
on the stack for each inputa
- Pop an
a
from stack for each inputb
- Accept if stack is empty at end
Advanced PDA Example: L = { a^n b^m c^n | n, m ≥ 0 }
- Push for
a
, ignoreb
, pop forc
- Accept on empty stack
Context-Free Grammars (CFG)
CFGs can express nested patterns like palindromes and balanced expressions.
Example:
S → aSb | ε
Generates strings like: “”, “ab”, “aabb”, “aaabbb”, etc.
Advanced CFG: Palindromes over {a, b}:
S → aSa | bSb | a | b | ε
Turing Machines
Turing machines use an infinite tape to simulate any computation.
Basic Example: Accept strings with equal number of a
s and b
s
- Scan tape marking
a
withX
, search and markb
withY
, repeat
Advanced Example: Binary incrementer
- Input: Binary number like
1101
- Operation: Increment by 1 →
1110
- Turing Machine walks right, flips trailing 1s to 0s, then flips first 0 to 1
CS 484 - Data Mining
Note: I feel like they try to scare you away in this class with a lot of math, but push through it — I promise it becomes more manageable later. You’re allowed a cheat sheet, so just write down all the math. I high-key forgot half of linear algebra by the time I took this class. My notes don’t have too much math, probably because I myself didn’t understand it.
Data Mining
Data Mining is the process of discovering patterns and anomalies in large datasets.
Data = collection of data objects
- Discrete attributes: Categorical values
- Nominal: no order (e.g., gender, color)
- Ordinal: ordered (e.g., star ratings)
- Continuous attributes: Numeric
- Interval (e.g., temperature)
- Ratio (e.g., weight)
Data Cleaning
Raw data isn’t ready for machine learning. It often has:
- Missing values
- Duplicates
- Non-numerical data (qualitative)
- Skewed distributions
Solutions include:
- Replace missing values with mean, median, or mode
- Merge or remove duplicates
- One-hot encoding/vectorization for categorical variables
- Normalize/standardize data (e.g., z-score normalization)
- Dimensionality reduction to remove redundancy
K-Nearest Neighbors (KNN)
KNN is a lazy learning algorithm that classifies a point based on its k nearest labeled neighbors.
- Time Complexity:
O(nD + nK)
- Hyperparameters:
-
k
: Number of neighbors - Distance metric (Euclidean, Manhattan, Cosine)
-
Effects of k
:
- Small
k
→ overfitting - Large
k
→ underfitting
Variants:
- Condensed KNN: Keeps only decisive points (usually boundary cases)
- Distance weighting: Closer points are weighted more heavily
Validation Techniques
- Validation Set: Separate data for model evaluation
- Cross-Validation: Split training data into k folds; each fold used once for validation
Model Evaluation: Confusion Matrix
Binary Classification Example
Suppose we have the following results from a binary classifier:
- True Positives (TP): 50
- False Negatives (FN): 10
- False Positives (FP): 5
- True Negatives (TN): 100
Predicted Positive | Predicted Negative | |
---|---|---|
Actual Positive | TP = 50 | FN = 10 |
Actual Negative | FP = 5 | TN = 100 |
Calculations:
-
Recall (Sensitivity):
Recall = TP / (TP + FN) = 50 / (50 + 10) = 50 / 60 = 0.833
-
Precision:
Precision = TP / (TP + FP) = 50 / (50 + 5) = 50 / 55 = 0.909
-
Accuracy:
Accuracy = (TP + TN) / Total = (50 + 100) / (50 + 100 + 5 + 10) = 150 / 165 = 0.909
-
F1 Score:
F1 = 2 * (Precision * Recall) / (Precision + Recall)
= 2 * (0.909 * 0.833) / (0.909 + 0.833) = 2 * 0.757 / 1.742 ≈ 0.87
ROC & AUC
- ROC Curve: Plots TPR vs FPR
- AUC (Area Under Curve): Indicates model’s ability to distinguish between classes. 1 = perfect, 0.5 = random guessing
Multi-Class Classification Example
Suppose we have a 3-class problem:
Class | TP | FP | FN |
---|---|---|---|
A | 40 | 10 | 5 |
B | 30 | 15 | 10 |
C | 50 | 5 | 20 |
Per-Class Metrics:
- Class A:
- Precision =
40 / (40 + 10) = 0.80
- Recall =
40 / (40 + 5) = 0.89
- Precision =
- Class B:
- Precision =
30 / (30 + 15) = 0.67
- Recall =
30 / (30 + 10) = 0.75
- Precision =
- Class C:
- Precision =
50 / (50 + 5) = 0.91
- Recall =
50 / (50 + 20) = 0.71
- Precision =
Macro-Averaged Metrics:
- Macro Precision:
(0.80 + 0.67 + 0.91) / 3 = 0.793
- Macro Recall:
(0.89 + 0.75 + 0.71) / 3 = 0.783
- Macro F1 Score:
- F1_A =
2 * (0.80 * 0.89) / (0.80 + 0.89) = 0.84
- F1_B =
2 * (0.67 * 0.75) / (0.67 + 0.75) = 0.71
- F1_C =
2 * (0.91 * 0.71) / (0.91 + 0.71) = 0.80
- Macro F1 =
(0.84 + 0.71 + 0.80) / 3 = 0.783
- F1_A =
Micro-Averaged Metrics:
- Total TP =
120
, FP =30
, FN =35
- Micro Precision =
120 / (120 + 30) = 0.80
- Micro Recall =
120 / (120 + 35) ≈ 0.774
- Micro F1 =
2 * (0.80 * 0.774) / (0.80 + 0.774) ≈ 0.787
PCA (Principal Component Analysis)
Goal: Reduce dimensionality while preserving variance
PCA Steps:
- Standardize the data (subtract mean, divide by std deviation)
- Compute the covariance matrix
S = (1/n) * X^T * X
- Compute eigenvalues and eigenvectors of the covariance matrix
- Sort eigenvalues in descending order and pick top
k
eigenvectors - Project the original data
X
onto the new subspace using the selected eigenvectors
Worked Example: Suppose we have 2D points: (2, 0), (0, 2), (3, 1), (1, 3)
- Standardize → subtract mean: mean_x = 1.5, mean_y = 1.5
- Centered data: (0.5, -1.5), (-1.5, 0.5), (1.5, -0.5), (-0.5, 1.5)
- Covariance matrix S =
[[1.67, -0.83], [-0.83, 1.67]]
- Eigenvalues: λ1 = 2.5, λ2 = 0.83 → Keep the one with 2.5
- Project data onto principal direction (eigenvector of 2.5)
Decision Trees
Used for classification by creating splits on attributes.
How to choose split?
- Gini Index:
Gini(D) = 1 - Σ (p_i)^2
- Information Gain:
IG(D, A) = Entropy(D) - Σ (|D_i| / |D|) * Entropy(D_i)
Gini Index Example: Consider a binary dataset with 6 samples: 4 Positive, 2 Negative
Gini = 1 - (4/6)^2 - (2/6)^2 = 1 - 0.444 - 0.111 = 0.445
Split on an attribute A results in:
- Left split: 2P, 1N → Gini = 1 - (2/3)^2 - (1/3)^2 = 0.444
- Right split: 2P, 1N → same Gini = 0.444 Overall Gini = 0.444
Information Gain Example: Original entropy:
Entropy = - (4/6)log2(4/6) - (2/6)log2(2/6) ≈ 0.918
If attribute split gives:
- Left: 2P, 1N → Entropy = 0.918
- Right: 2P, 1N → same Entropy IG = 0.918 - (3/6)0.918 - (3/6)0.918 = 0 So no gain — try different attribute
Linear Regression
Goal: Predict a continuous output from a linear combination of features. Essentially a glorified y = mx + b, but with as many variables (the x) as needed to represent all the features. The main goal is to find weight vectors (the m) which will minimize the over rss.
Model:
y = w_0 + w_1x_1 + w_2x_2 + ... + w_nx_n
Loss Function (RSS):
RSS = Σ(y_i - ŷ_i)^2 = Σ(y_i - (w^T x_i))^2
Gradient Descent Update Rule:
w_j = w_j - α * ∂RSS/∂w_j = w_j - α * Σ(2 * (y_i - ŷ_i) * -x_ij)
Step size is an important hyperparameter because too small and the learning is too slow, but too big of a step will cause a lot of bouncing between having to go forward and backward.
Stopping rule: if validation is not changing more than some number, n, stop training.
Example: Dataset: {(x=1, y=2), (x=2, y=4)} Start with: w = 0, b = 0 Predictions: ŷ = 0 Error: e = y - ŷ = [2, 4] Gradient: ∂RSS/∂w = -2 * (21 + 42) = -20 Update: w = 0 + α * 20
Regularization:
- L1 (Lasso):
λ * Σ|w_i|
→ sparse weights – good for feature selection - L2 (Ridge):
λ * Σw_i^2
→ smaller weights – decreases outlier sensitivity
Logistic Regression
Used for binary classification because the sigmoid function is bound by 0’s and 1’s
Bernoulli Model:
P(y|x) = σ(w^T x) = 1 / (1 + e^{-w^T x})
Loss Function (Binary Cross Entropy):
L = -Σ [y_i * log(p_i) + (1 - y_i) * log(1 - p_i)]
Multiclass (Softmax): The output of softmax is a list of probabilities summing to 1. The number which is the highest is the most likely class it belongs to.
P(y=j|x) = e^{z_j} / Σ e^{z_k}
Cross-Entropy Loss (Softmax):
L = -Σ y_ij * log(softmax_j)
Gradient:
∂L/∂w_j = Σ (p_j - y_j) * x
Perceptrons and Neural Networks
Perceptron: A single-layer neural network that takes in multiple inputs, applies weights, sums them, and uses an activation function to produce a binary output.
- Training: Perceptrons learn by adjusting weights after each sample. It compares the predicted output with the true label, calculates the error, and updates the weights using the perceptron learning rule.
- Update rule:
w = w + α * (y - ŷ) * x
Example: Suppose we want a perceptron that performs logical AND:
- Input: x1, x2 ∈ {0,1}
- Weights: w1 = 1, w2 = 1, bias = -1.5
- Activation: output = 1 if (x1w1 + x2w2 + bias) > 0, else 0
Truth table:
- x1=0, x2=0 → output = 0
- x1=1, x2=0 → output = 0
- x1=0, x2=1 → output = 0
- x1=1, x2=1 → output = 1
Multilayer Perceptron (MLP):
- A type of feedforward neural network with one or more hidden layers.
- Forward propagation: Moves information forward through layers.
- Backward propagation: Sends error backward and updates weights using gradients.
- Hidden layers:
- Capture complex, non-linear relationships
- Automatically learn feature combinations and perform dimensionality reduction
- Increasing neurons allows learning more complex patterns
- Increasing hidden layers increases the abstraction and hierarchy in the model
Too many neurons/layers → risk of overfitting. Too few → underfitting.
Activation Function Options:
- Sigmoid:
1 / (1 + e^{-x})
— Not recommended in hidden layers due to vanishing gradient problem, but fine in output layers for binary classification. - Tanh: Similar to sigmoid but ranges [-1,1] — zero-centered.
- ReLU:
max(0, x)
— Default choice for hidden layers; efficient and helps convergence. - Leaky ReLU: Outputs a small negative value for negative inputs instead of zero.
- SELU: Scaled ELU — good for fully connected layers and self-normalizing networks.
Dropout Layers:
- Each neuron has a probability of being ignored (set to 0) during training.
- Used to prevent overfitting and improve generalization.
- Only active during training, not testing.
Block Feature Vectors:
- Used especially for text — splits input into fixed-sized blocks and processes each independently.
Generative vs. Discriminative Models:
- Generative: Model the full data distribution
P(x, y)
to understand and generate data (e.g., Naive Bayes, GANs). - Discriminative: Focus only on
P(y|x)
— learning boundaries between classes (e.g., logistic regression, SVM).
Deep Learning:
- More neurons + more layers = deeper models
- Necessary for feature abstraction and complex data types like text, speech, or images
Convolutional Neural Networks (CNNs):
- Specially designed for image and grid-based data.
- Input: Raw image (matrix of pixels)
- Convolutional filter: Learns local patterns like edges, corners
- Feature map: Output of applying a filter
- Activation function: Adds non-linearity to feature maps
- Pooling layer: Downsamples feature maps, often with max pooling
- Flatten: Converts pooled maps into a 1D vector for fully connected layers
Gradient Optimizers:
- SGD: Stochastic Gradient Descent — simple and widely used
- Adagrad: Adapts learning rate per feature; helps with sparse data
- RMSprop: Adjusts learning rate using a moving average of gradients
- Adam: Combines RMSprop and momentum; widely used for deep learning
Batch Normalization:
- Normalizes activations in hidden layers
- Helps mitigate exploding/vanishing gradients
- Stabilizes and accelerates training
MLP Architecture:
- Input → Hidden Layers → Output
- Use activation functions like ReLU, sigmoid, tanh
Backpropagation:
- Compute error from output
- Propagate error backward through layers
- Update weights using gradient descent
Dropout: Randomly drop neurons during training to reduce overfitting.
Recommender Systems
*Collaborative Filtering:
- Item-based filtering : Predict rating of user
u
on itemi
:r̂_ui = r̄_u + (Σ sim(u,v) * (r_vi - r̄_v)) / Σ |sim(u,v)|
- User-based: Predicts rating of user
u
on another useru1
Computation heavy! Henvce matrix factorize which decomposes matrix into 2 smaller ones and creates latent factors which are not explicitly defined but inferred from the data.
Matrix Factorization:
- Approximate R ≈ U * V^T
- Optimize latent features U and V using SGD
- Loss:
(R - UV^T)^2 + λ (||U||^2 + ||V||^2)
Example: User-Item ratings:
M1 M2 M3
U1: 5 ? 3
U2: 4 2 ?
After factorization:
- Estimate U1’s missing rating using dot product of U1-row and M2-column in factorized matrices
Association Rule Mining
Terms:
- Support: Fraction of transactions with both A and B
-
Confidence: P(B A) = support(A and B) / support(A)
Apriori Principle: All subsets of a frequent itemset must be frequent.
Example: Transactions:
- {milk, bread}
- {milk, butter}
- {bread, butter}
- {milk, bread, butter}
Support:
- support({milk, bread}) = 2/4 = 0.5 Confidence:
- confidence(milk → bread) = support(milk & bread) / support(milk) = 0.5 / 0.75 = 0.67
Clustering
K-means Clustering:
- Choose k initial centroids
- Assign each point to the nearest centroid
- Update centroids as the mean of points in each cluster
- Repeat until convergence
Example: Data: [1, 2, 10, 11] Initial centroids: 1 and 10 Iteration 1: [1,2] to Cluster 1, [10,11] to Cluster 2 New centroids: 1.5 and 10.5 → converged
Hierarchical Clustering:
- Agglomerative: Merge closest clusters iteratively
- Divisive: Split cluster until each point stands alone
DBSCAN:
- Parameters:
eps
andminPts
- Classify core, border, and noise points
- Example: Points (1,2,3,10,11); eps=2, minPts=2
- Core: {1,2,3}, Cluster 1
- Core: {10,11}, Cluster 2
Clustering Evaluation
SSE (Sum of Squared Errors): Measures cohesion SSB (Between groups): Measures separation Homogeneity: All clusters contain only members of a single class Completeness: All members of a given class are in the same cluster V-measure: Harmonic mean of homogeneity and completeness
CS 478 - Natural Language Processing
Logistic Regression —
Sentiment analysis problem – binary classification (is this tweet positive or negative?).
Feature Extraction: Create a large vector containing every single word in the vocabulary and assign value 1 if the word is present in the tweet or 0 if not.
- Problem: Very sparse vector since most tweets will not have many overlapping words, resulting in slow prediction.
Alternate Strategy:
- Create classes with positive and negative word connotations.
- For a given tweet, count the number of positive and negative words. Whichever class has more words determines the sentiment.
Preprocessing a Tweet:
- Remove common words (“a”, “the”, etc.)
- Remove punctuation
- Remove Twitter handles and URLs
Bag of Words Approach:
- Convert all words to lowercase
- Perform stemming (e.g., “stopped” → “stop”)
- Compare each word to a positive/negative dictionary and count occurrences
Example:
- Tweet: “I loved the new Batman movie! Absolutely amazing.”
- After cleaning: [“love”, “new”, “batman”, “movie”, “absolutely”, “amazing”]
- Match with dictionary: 3 positive words → classify as positive sentiment
Text Normalization
- Tokenization – Identify words in a sentence
- Lemmatization – Normalize words to their root form (“am”, “are”, “is” → “be”)
- Stemming – Remove suffixes (“happily” → “happi”)
- Segmentation – Break text into sentences
Problems with Tokenization:
- Hyphens
- Acronyms
- Contractions
- Multi-word units (“New York”)
Ambiguity:
- Prepositional phrase attachment is confusing (“I saw the man with a telescope”)
Byte Pair Encoding (BPE)
Start with every character as a token and build a dictionary by:
- Merging adjacent tokens
- Tracking frequency
- Keeping merges that occur above a frequency threshold
- Repeating for groups of 3, 4… until frequency drops or no more segments
Example:
- Text: “low lower lowest”
- Initial tokens: [l, o, w], [l, o, w, e, r], [l, o, w, e, s, t]
- Merge most frequent pair: (l, o) → lo
- New tokens: [lo, w], [lo, w, e, r], …
- Continue merging: (lo, w) → low, (low, e) → lowe, etc.
Purpose: Handles rare/unseen words better; enables subword segmentation.
N-Gram Modeling
Predict the next word based on the n-1
previous words.
- Unigram: P(word)
-
Bigram: P(w2 w1) -
Trigram: P(w3 w1, w2)
Example: Corpus: “I love NLP. I love AI.”
- Bigrams: (“I love”), (“love NLP”), (“love AI”)
-
P(“love” “I”) = 2/2 = 1 -
P(“AI” “love”) = 1/2 = 0.5
Issues: Zero probability for unseen n-grams.
Solutions:
- Laplace Smoothing: Add 1 to all counts.
- Interpolation: Combine uni-, bi-, and tri-gram models linearly.
- Backoff: If trigram not seen, use bigram, then unigram.
Evaluation Metrics
- Log-Likelihood: Measures how likely a model generated a sequence. Avoids underflow by summing log probabilities.
LL = Σ log P(w_i | w_{i-1})
- Cross Entropy:
H(p) = -1/N Σ log_2 P(w_i)
Lower = better
- Perplexity:
PP = 2^H(p)
- Lower perplexity = better; interpreted as the average number of choices per word
Example: Suppose cross-entropy H(p) = 1.5 → PP = 2^1.5 ≈ 2.828 → model has ~2.8 likely choices per word
Decoding Strategies
- Greedy: Always pick the highest probability word
- Beam Search: Explore top
n
paths and keep the best long-term one - Sampling: Randomly pick a word based on probabilities
- Top-k Sampling: Randomly choose from top-k most likely words
Neural Networks
See Data Mining section. Same architecture and training applies.
Zipf’s Law
The frequency of a word is inversely proportional to its rank:
- Rank 1: “the” → appears 22000 times
- Rank 2: “of” → appears 11000 times
Word Embeddings
Embeddings are dense vector representations of words, often trained via:
- Word2Vec
- CBOW: Predict target from context
- Skip-gram: Predict context from target
Distributional Representation:
- Based on co-occurrence — similar words share contexts
- E.g., “cat” and “dog” occur with “fur”, “pet”, “tail” → similar vectors
Distributed Representation:
- Encodes concepts on separate dimensions
- E.g., “bank” → [finance_dim=1, river_dim=0.3]
Count-Based Embeddings
Build word-context matrix using frequency statistics
TF-IDF (Term Frequency-Inverse Document Frequency)
TF = (count of term in doc) / (total terms in doc)
IDF = log(N / df)
TF-IDF = TF * IDF
Example:
- Term “deep” appears 3 times in a doc of 100 words
- Appears in 5 of 1000 docs
TF = 3 / 100 = 0.03 IDF = log(1000 / 5) = log(200) ≈ 2.30 TF-IDF = 0.03 * 2.30 ≈ 0.069
PMI (Pointwise Mutual Information)
PMI(w, c) = log( P(w, c) / (P(w) * P(c)) )
Example:
- P(w, c) = 0.001; P(w) = 0.01; P(c) = 0.005
PMI = log(0.001 / (0.01 * 0.005)) = log(0.001 / 0.00005) = log(20) ≈ 1.30
- TF-IDF: good for document classification/search
- PMI: useful for word similarity
Word-Context Matrix
A co-occurrence matrix where:
- Rows = words
- Columns = context words (e.g., 5 words before/after)
- Values = count or normalized metric (PMI, TF-IDF)
Handling Long-Distance Dependencies
Problem: Words can depend on faraway tokens
Solution: Recurrent Neural Networks (RNN)
- Include feedback loop to maintain internal state
- Struggles with long-term dependencies due to vanishing gradients
LSTM (Long Short-Term Memory):
- Introduces memory cells and gates (input, forget, output)
- Maintains long-term context using additive updates
CS 485 - Robotics
Note: This class was actually so confusing, so I don’t have many detailed notes.
Rigid Body Transformations
Rigid body transformations involve orientation and position. These include translations and rotations, which are typically represented as matrix transformations.
Example of matrix representation [R|t] (2x3):
| R11 R12 t1 |
| R21 R22 t2 |
Where R is a 2x2 rotation matrix and t is the translation vector.
Pose = Position + Orientation
Global Warping
Applies a fixed set of operations (e.g., rotate, scale) to every point in space, usually represented as matrix operations.
2x2 Matrix Examples:
- Scaling:
| sx 0 | | 0 sy |
- Rotation (angle θ):
| cos(θ) -sin(θ) | | sin(θ) cos(θ) |
- Reflection (over y-axis):
| -1 0 | | 0 1 |
- Shear (x-direction):
| 1 shx | | 0 1 |
Translation in Homogeneous Coordinates
Translation requires an extra coordinate (homogeneous coordinate).
3x3 Translation Matrix:
| 1 0 tx |
| 0 1 ty |
| 0 0 1 |
A transformation using homogeneous coordinates is considered affine if the last row is [0, 0, 1]
.
A homography is a projective transformation where the last row is [e, f, 1]
.
Layers of Transformation
Type | DOF | Preserves |
---|---|---|
Translation | 2 | Straight lines, parallelism, angles, lengths, orientation |
Rigid/Euclidean | 3 | Straight lines, parallelism, angles, lengths |
Similarity | 4 | Straight lines, parallelism, angles |
Affine | 5 | Straight lines, parallelism |
Projective | 6 | Straight lines |
Matrix Representations:
- Translation:
| 1 0 tx | | 0 1 ty | | 0 0 1 |
- Rigid:
| cos(θ) -sin(θ) tx | | sin(θ) cos(θ) ty | | 0 0 1 |
- Similarity:
| s*cos(θ) -s*sin(θ) tx | | s*sin(θ) s*cos(θ) ty | | 0 0 1 |
- Affine:
| a b tx | | c d ty | | 0 0 1 |
- Projective:
| a b tx | | c d ty | | e f 1 |
Inverse Kinematics
You know the end-effector’s position and want to determine the joint angles or movements to get there.
- Solving triangle geometry backwards.
- Adds complexity in 3D due to orientation.
TF (Transform Library in ROS): Handles frame tracking and transformations in ROS.
Motion Control
Turns high-level goals into actuator commands.
- Controller sends commands to actuators to affect system state.
Types:
- Open-loop: No feedback from environment (no sensors).
- Closed-loop: Feedback from sensors is used to update actions.
PID Controller (Proportional-Integral-Derivative)
A closed-loop system that combines three components:
- Proportional (P): Based on current error
- Integral (I): Accumulated past error
- Derivative (D): Rate of change of error
PID Equation:
u(t) = K_p * e(t) + K_i * ∫e(t)dt + K_d * de(t)/dt
Where:
-
u(t)
is the control output -
e(t)
is the error at time t -
K_p
,K_i
,K_d
are tuning constants
Example:
- Desired position = 10 units
- Current = 7 units → error = 3
- Proportional term = Kp * 3
- Integral term = Ki * total accumulated error
- Derivative = Kd * (change in error)
Motion Planning
Motion control alone doesn’t work in complex environments.
ROS Global Planner:
- Uses A* or Dijkstra’s algorithm on a global costmap
- Creates a global path with waypoints
ROS Local Planner:
- Uses local costmap and dynamic window approach
- Computes small, feasible trajectories around obstacles
Combine both: Global planner gives long-term direction; local planner handles short-term pathing and obstacle avoidance.
Maze Solving Algorithms:
- BFS
- DFS
- Heuristic Informed (A*)
- Optimal Informed (Dijkstra’s)
- Iterative Deepening
Vertical Cell Decomposition (VCD):
- Split environment vertically between obstacles
- Build connections between accessible vertical strips
Example of VCD:
- Obstacles represented as polygons
- Extend vertical lines up/down from each vertex
- Connect open areas → build graph
Other Planning Tools:
- Visibility Graph: Straightest possible path, but may lead to sharp turns
- Voronoi Diagram: Equidistant from obstacles → safer but longer path
Search Strategies:
- Greedy Best-First: Always choose the node closest to the goal (not always optimal)
- Uniform Cost Search (UCS): Expands path with lowest cost (same as BFS if all costs equal)
- A*: Combines UCS and Greedy using cost + heuristic
Filtering (State Estimation)
Assumptions:
- Markov: Only the previous state matters
- Stationary process: Observation probabilities are constant
- Prior state: Initial state is known or estimated
Types:
- Discrete: Hidden Markov Models (HMM)
- Continuous: Kalman Filter (linear + Gaussian)
- Non-Gaussian: Particle Filter (samples possible states)
Joint Probability Distribution:
P(x_1, x_2, ..., x_t, z_1, z_2, ..., z_t) = P(x_1) * Π P(x_t | x_{t-1}) * Π P(z_t | x_t)
Prediction Step:
P(x_t | z_{1:t-1}) = ∫ P(x_t | x_{t-1}) * P(x_{t-1} | z_{1:t-1}) dx_{t-1}
Update Step:
P(x_t | z_{1:t}) ∝ P(z_t | x_t) * P(x_t | z_{1:t-1})
CS 367 - Systems Programming
Bitwise Operations
Everything in computers is in bits (1s and 0s), which are easy to store due to their binary nature (on/off).
Bitwise operations are performed on individual bits:
- Shift operations
- Bit masking
- Logical operations (AND, OR, NOT, XOR)
Logical Shift: Used with unsigned integers. Leftmost bits are filled with 0s after a right shift.
Arithmetic Shift: Used with signed integers. Leftmost bits are filled with the sign bit (1 for negative, 0 for positive) to preserve the sign.
Example:
Logical Right Shift: 1100 0000 >> 2 = 0011 0000
Arithmetic Right Shift: 1100 0000 >> 2 = 1111 0000 (preserve sign)
Memory and Addressing
Memory is like a large array of bytes. Programs refer to data using addresses. Each process is assigned a private address space.
- Word size: Number of bits CPU can address (e.g., 64-bit → 2^64 bytes)
- Byte-ordering:
- Little endian: MSB is on the right (lower address)
- Big endian: MSB is on the left (higher address)
Example:
Value: 0x12345678
Little endian: 78 56 34 12
Big endian: 12 34 56 78
- RISC: All operations are the same size
- Negative values are represented using two’s complement
Integer Casting & Overflow
- Casting up:
- If unsigned → pad with 0
- If signed → pad with MSB
- Casting down:
- Truncation (may lose data)
Overflow Detection:
- Unsigned: Overflow if sum < operands
- Signed: Overflow if sign changes incorrectly
Integer Multiplication via Bit Shift:
x * 8 = x << 3
x * 12 = x << 3 + x << 2
Integer Division via Bit Shift:
x / 8 = x >> 3 (only works for powers of 2)
Floating Point Representation
Notation: (-1)^s * M * 2^E
7-bit floating point layout:
- 1 bit sign
- 3 bit exponent
- 3 bit fraction
Bias: bias = 2^(e-1) - 1
Example: 6.5
- Binary: 110.1 =
1.101 x 2^2
- Sign = 0
- Frac = 101
- Exp = 2 + 3 (bias) = 5 = 101
- Final:
0 101 101
Types of Floating Point Values:
- Normalized:
00...0 < exp < 11...1
- Special:
exp = 111
→ INF/NAN - Denormalized:
exp = 000
Denormalized Example: 3/16
- Decimal: 0.0011 =
1.1 x 2^-3
- Bias = 3, Exp = -3 → 0 → denormalized
- New definitions:
- E = 1 - bias = -2
- M = 0.11 → Frac = 110
- Final:
0 000 110
Special Values:
-
1 111 000
= -INF -
0 111 000
= +INF -
0 111 001
= NaN
Floating Point Multiplication:
(-1)^s1 * m1 * 2^e1 × (-1)^s2 * m2 * 2^e2
= (-1)^(s1+s2) * (m1 × m2) * 2^(e1 + e2)
Structs & Memory Alignment
Structs are laid out in memory in declaration order, and alignment rules dictate that data types must start at offsets that are multiples of their size.
Example:
struct c {
char c; // offset 0 (1 byte)
int a[2]; // offset 4 & 8 (each int = 4 bytes)
long b; // offset 16 (8 bytes)
};
- Total size = 24 bytes (padded to be multiple of 8)
Optimized Layout:
struct c {
long b; // offset 0
int a[2]; // offset 8 & 12
char c; // offset 16
};
- Padding only at the end
Dynamic Allocation:
- Double-word aligned (start on even address)
- Header contains size & allocation status
Allocation Strategies:
- First fit, next fit, best fit
- Free block merging to reduce fragmentation
Garbage Collection:
- Memory blocks as nodes, pointers as edges
- Unreachable nodes are removed
Exception Handling
Control Flow: Handles unexpected events (interrupts, system calls, faults)
Types of Exceptions:
- Interrupts: Asynchronous (e.g., I/O devices)
- Traps: System calls (e.g.,
read
,fork
,exec
) - Faults: Illegal memory access (e.g., segmentation fault)
- Aborts: Unrecoverable hardware errors
Microkernel vs Monolithic Kernel:
- Microkernel: Minimal OS functionality (e.g., Windows, macOS)
- Monolithic: Large access, customizable, risky (e.g., Linux)
Context Switching: Kernel maintains:
- Page table, file table, process table, registers, stack
Process Lifecycle:
-
fork()
→ Creates child process -
exit()
→ Terminates process -
wait()
orwaitpid()
→ Parent reaps child
Zombie State: Child terminated but not yet reaped
Signals: Notify process of events
- Used for interprocess communication
- Propagated to process groups (e.g., pipelines)
sigaction Example:
void handler(int sig) {
printf("Caught signal %d\n", sig);
}
struct sigaction sa;
sa.sa_handler = handler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
sigaction(SIGINT, &sa, NULL);
Race Condition Prevention: Block signals before forking, unblock afterward
Virtual Memory
Virtual memory gives each process an isolated address space. Pages are swapped in/out of disk.
- Page fault = page not in memory → OS brings it in
- Page Table maps virtual pages to physical frames
- TLB (Translation Lookaside Buffer): Caches page table entries
Translation:
- Virtual Address → VPN + Offset
- TLB → Index + Offset → PPN
Full Virtual Address Translation Example
Simulates how a virtual address is translated to a physical address, and then used to access the cache, TLB, page table, and finally main memory.
System Configuration
Parameter | Value |
---|---|
Virtual Address Size | 16 bits |
Physical Address Size | 16 bits |
Page Size | 4 KB = 2¹² bytes |
Offset Bits | 12 bits |
VPN Bits | 4 bits |
TLB Entries | 4 |
Cache Lines | 4 |
Cache Block Size | 1 word (1 line) |
Example Virtual Address
We’re translating:
Virtual Address (VA) = 0x1234
Binary: 0001 0010 0011 0100
- VPN (Virtual Page Number) = first 4 bits =
0001
=0x1
- Page Offset = last 12 bits =
0x234
Step 1: TLB Lookup
TLB Table (before lookup):
Index | VPN | PPN | Valid |
---|---|---|---|
0 | 0x0 | 0x3 | 1 |
1 | 0x2 | 0x4 | 1 |
2 | 0x3 | 0x6 | 1 |
3 | 0x4 | 0x2 | 1 |
- Lookup VPN = 0x1
- Not found → TLB Miss
Step 2: Page Table Lookup
Page Table:
VPN (Index) | PPN | Valid |
---|---|---|
0x0 | 0x3 | 1 |
0x1 | 0xA | 1 ✅ |
0x2 | 0x4 | 1 |
0x3 | 0x6 | 1 |
- VPN =
0x1
→ PPN =0xA
- OS loads TLB entry: (VPN 0x1 → PPN 0xA)
Step 3: Construct Physical Address
Combine:
- PPN (Physical Page Number) =
0xA
- Offset =
0x234
Physical Address (PA) = (PPN << 12) | Offset
= (0xA << 12) | 0x234
= 0xA000 | 0x234
= 0xA234
Step 4: Cache Lookup
Let’s assume a direct-mapped cache with 4 lines:
- Cache index bits: 2 bits →
2^2 = 4 lines
- Offset: 0 bits (1-word cache block)
Break PA (0xA234):
- Binary:
1010 0010 0011 0100
- Tag: upper 14 bits =
101000100011
(0xA23) - Index: bits 2–3 =
01
→ Line 1 - Offset: 0 bits → single word
Cache Table:
Line | Valid | Tag | Data |
---|---|---|---|
0 | 1 | 0xA20 | 0xFFFF |
1 | 1 | 0xA21 | 0x1234 |
2 | 0 | – | – |
3 | 1 | 0xB10 | 0xF00D |
- Index =
1
, Tag in PA =0xA23
≠ Tag in cache =0xA21
→ Cache Miss
Step 5: Access Main Memory
- Cache miss → Load data from physical memory at 0xA234
- Cache line 1 is updated with tag
0xA23
and value fetched
Summary Flow
Virtual Address = 0x1234
├──> TLB Lookup
│ └── MISS
│
├──> Page Table Lookup
│ └── VPN 0x1 → PPN 0xA
│
├──> Construct PA = 0xA234
│
├──> Cache Lookup @ Index 1
│ └── MISS (tag mismatch)
│
└──> Load from Physical Memory[0xA234] → Update Cache
cache has the physical addressses stored. —
Assembly Language
Compiled code translates to assembly, which consists of simple operations and direct memory access.
Operand Types:
-
$
= constant -
%
= register -
0xADDR
= memory address
Example:
movq %rax, %rbx # copy rax to rbx
Registers:
- 8-bit:
%al
,%bl
,%cl
,%dl
- 16-bit:
%ax
,%bx
,%cx
,%dx
- 32-bit:
%eax
,%ebx
,%ecx
,%edx
- 64-bit:
%rax
,%rbx
,%rcx
,%rdx
Jump Example:
start:
cmp $1, %eax
je equal
jmp end
equal:
mov $1, %ebx
end:
Caller-saved: Functions don’t restore them Callee-saved: Must be preserved by callee
C Function vs Assembly:
int add(int a, int b) {
return a + b;
}
add:
mov %rdi, %rax # copy a to return register
add %rsi, %rax # add b
ret
Memory Locality and Hierarchy
Temporal Locality: Reuse of recently accessed data Spatial Locality: Accessing nearby data (e.g., arrays)
Hierarchy:
- Register – Fastest
- Cache – L1/L2
- Main Memory – DRAM
- Disk – Persistent storage
Static RAM: Cache, fast but costly Dynamic RAM: Main memory, dense
Performance Metrics:
- Transfer latency
- Rotational latency
- Seek time
Cache Example
Address broken into segments:
- Tag: Uniquely identifies block
- Line: Row in cache
- Offset: Byte within block
Example Table:
Address: 0x1234AB
Tag: 0x12
Line: 0x34
Offset: 0xAB
Formulas:
-
2^tag
blocks per set -
2^line
sets -
2^offset
bytes per block
CS 471 - Operating Systems
Note: Half of OS is a repeat of CS 367 because Computer Engineering students also take this course.
OS Refresher
- Von Neumann architecture
- Virtualization (Virtual Memory)
- Provides system calls as standard library
- Resource manager
- Multithreading
- Volatile vs Non-volatile memory
- Abstractions
- Provides protections
- Isolates processes
Processes
- Program: Code stored on disk
- Process: When the code is loaded and running
States:
- Running
- Ready
- Blocked
- Terminated
Concurrency
- Each thread has its own set of registers
- Context switch with threads and processes works very similarly
- State of each process → moved to Process Control Block (PCB)
- State of each thread → moved to Thread Control Block (TCB)
- Thread context switches do not usually involve paging (same address space)
- Global variables must be protected to ensure mutual exclusion
- Race conditions can lead to nondeterministic behavior
Direct Execution
- Create entry on process list
- Allocate memory for the program
- Load program into memory
- Set up stack and clear registers
- Run main
- Return from main
- Free memory
- Remove from process list
Problem: No time-sharing
Limited Direct Execution
- Process runs in user mode (can access program files, registers)
- For I/O: switch to kernel mode (has unrestricted access)
Trap Instruction → switch to kernel
Return from Trap → switch back to user mode
Execution Flow:
- Create entry on process list
- Allocate memory for the program
- Load program into memory
- Set up stack and clear registers
- Run main
- System call (trap)
- Save registers to stack
- Kernel mode
- Handle trap
- Return to user mode
- Restore registers
- Return from main
- Free memory
- Remove from process list
When to Context Switch
- Option 1 (Cooperative): Switch when the process makes a system call
- Problem: If no system calls, a program may hog the CPU
- Option 2 (Non-cooperative): Timer interrupts trigger a switch
Two Saves:
- Timer interrupt → Hardware saves user register state using kernel stack
- OS switches process → Kernel register state saved by OS
CPU Scheduling
Assumptions:
- Jobs run for a certain time
- All arrive at the same time
- Once started, runs to completion
- No I/O
- Runtime is known
Scheduling Algorithms:
- FCFS: Simple, but short tasks wait for long ones
- Shortest Job First: Optimal if jobs known in advance
- Shortest Time to Completion: Preemptive version of SJF, allows shorter tasks to interrupt
- Round Robin: Fixed time slice, good for time-sharing but can delay long tasks
- With I/O: While one task does I/O, another can run
Real World: OS doesn’t know how long each task takes
Multilevel Feedback Queue (MLFQ)
- Multiple priority queues
- Jobs move up/down levels
- Prevents starvation (but can still happen with many high-priority tasks)
Proportional Share Scheduling:
- Based on CPU usage
- Ensures all processes get some CPU
Lottery Scheduling:
- Tasks receive a number of tickets
- Random winner per scheduling round
- Prevents starvation using randomness
- Example: Task A gets tickets 1-10, Task B gets 11-20
Linux Scheduler:
- Completely Fair Scheduler (CFS)
- Priority-based
- Tracks virtual runtime
- Uses Red-Black Tree to manage scheduling efficiently
Multiprocessor Scheduling:
- Each CPU has its own cache
- Requires hardware support
Memory: Base and Bounds (Dynamic Relocation)
- Two registers: Base and Bounds
Physical Address = Virtual Address + Base
- Bounds = upper memory limit
- Prevents processes from accessing memory outside their segment → segfault
- Base and bounds updated on each context switch
Livelock
- Two processes constantly changing state without progress
- Solution: Interrupts / busy-wait handling
Data Transfer
- Programmed I/O: CPU actively manages data transfer
- DMA (Direct Memory Access): CPU writes data to memory, DMA controller handles transfer
Disk Performance
Latency = Seek Time + Rotational Latency + Transfer Time
Disk Scheduling:
- FIFO: Worst performance
- Shortest Positioning Time First (SPTF): Fast, but risks starvation
- SCAN: Sweep from one end, services along the way (middle gets priority)
- C-SCAN: Sweep one direction only, resets without servicing on return
RAID (Redundant Array of Inexpensive Disks)
Appears as one disk to OS, provides performance, capacity, and reliability.
- RAID-0 (Striping):
- No redundancy
- Capacity = N
- No fault tolerance
- RAID-1 (Mirroring):
- Store exact copy
- Capacity = N/2
- Fault-tolerant
- RAID-4 (Parity):
- One disk holds parity info
- Capacity = N - 1
- Higher latency for random writes
- RAID-5 (Rotating Parity):
- Parity spread across disks
- Reduces write latency
CS 475 - Concurrent and Distributed Systems
Overview
- Concurrent: Multiple tasks on the same computer, using scheduling to switch between them.
- Parallelism: Multiple CPUs running tasks truly simultaneously.
- Distributed: Multiple computers sharing a task over a network.
Multithreading
- Each thread has its own stack within the overall stack and its own control flow.
- Shared memory (like global variables) → risk of race conditions.
- Threading is less expensive (doesn’t require its own page table).
Race Condition Example: Thread 1 and Thread 2 both read value 5. Each subtracts 1. Both write 4 back → expected result was 3.
Solution: Use locks to restrict access to shared memory.
Locks and Synchronization
- Atomic Instruction: Retrieve and write in one go.
- Basic Lock Functions:
acquire()
release()
C-style Mutex Locks
Ways to implement locks:
- Disable Interrupts:
- Only viable for kernel code.
- Not safe (can be abused).
- Peterson’s Algorithm:
- Works for 2 threads.
- Threads alternate.
- Lots of waiting.
- Test and Set (Spinlock):
- Spin until the lock becomes available.
- Problem: still has race conditions.
- Atomic Test and Set:
- Atomic check + acquire.
- Disable interrupts only when acquiring.
- Compare and Swap:
- If lock is 0 → set to 1, else wait.
- Atomic by hardware.
Condition Variables
Used to control the order of execution among threads.
Bounded Buffer Problem:
- Producer only adds if space is available.
- Consumer only removes if buffer not empty.
Solution:
- Use
wait()
andsignal()
. - Always put
wait()
in awhile
loop to avoid spurious wakeups and race conditions.
Semaphores
- Work as both locks and condition variables.
- Binary or counting semaphores.
sem_t s;
sem_init(&s, 0, 1); // binary semaphore
sem_wait(&s); // acquire
sem_post(&s); // release
Concurrent Data Structures
- Concurrent Queue: One lock for the entire queue.
- Concurrent Hashtable: One lock per chain (if chaining).
- Concurrent Linked List: Two locks for adjacent nodes.
Distributed Systems
Architecture
- RPC (Remote Procedure Call)
- MOM (Message-Oriented Middleware)
Message Invocation Models
- At least once – e.g., SunRPC
- Exactly once – like a local function
- At most once – e.g., Go/Java
Idempotency
- A function that returns the same result even when executed multiple times.
MapReduce
- Input/output (I/O) is the bottleneck
- Automates parallelization and distribution across machines
- Input:
(key, value)
-
map()
→ intermediate pairs -
reduce()
→ merges all intermediate pairs
Time Synchronization
- Ensure that requests happen in correct order
Techniques:
- NTP (Network Time Protocol): syncs to UTC via satellite/radio
- Christian’s Algorithm: uses round-trip time to sync clocks
- Lamport Clocks: happened-before logic using counters
- Totally Ordered Multicast:
- Broadcast update
- Receive ACKs
- Apply only when everyone has received
Vector Clocks
- Vector of integers (one per process)
- Captures causality
Rules:
- Increment your own clock
- Send your vector with messages
- Upon receiving, merge vectors
System Models
- Synchronous: Known message delay, global clock
- Asynchronous: No timing assumptions
Failure Models
- Crash-Fail: Deterministic, node crashes
- Partition: Node split/network disconnection
- Byzantine: Arbitrary failures (even malicious)
Distributed Mutual Exclusion
Assumptions:
- Reliable network
- Asynchronous processes
- Crash-fail only
Approaches:
- Centralized Lock Server:
- All requests go to a single server
- Server grants access
- Problem: single point of failure
- Ring-based Algorithm:
- Token is passed
- No central server
- Problem: ring breakage
- Shared Priority Queue (Lamport Queues):
- Timestamps
- Floods messages → overhead
- Ricart & Agrawala:
- Combines request and release
- Optimizes Lamport’s idea
Distributed Transactions
- Transactions must be atomic (all or nothing)
- ACID properties
Two-phase locking:
- Acquire all locks
- Release all locks
Coordinator:
- One node starts transaction
- Decides to commit/abort
Two-Phase Commit (2PC)
- Voting Phase:
- All nodes vote commit or abort
- Once voted commit, cannot revert
- Commit Phase:
- Actually apply changes
Failure Handling:
- Participant crash → Ask coordinator for result
- No response → Default to abort
- Coordinator crash → Elect a new one or use previously sent result
Three-Phase Commit (3PC)
Adds a step to improve liveness:
- Vote
- Pre-commit broadcast
- Commit notice
Helps if coordinator crashes → everyone knows the decision.
Problem:
- If half receive, half don’t → partition
- Majority continues, minority syncs later
CAP Theorem
You can have only two out of:
- Consistency
- Availability
- Partition Tolerance
Partitioning is non-negotiable in distributed systems.
ZooKeeper : Coordinator system used for leader election, configuration, and locking in distributed systems.
CS 483 - Analysis of Algorithms
Note: For this class — just do a bunch of Leetcode to build intuition. Then, learn how to write proofs. That’s it. This is a very applied class; you won’t get much out of just reading theory.
Representative Problems
- Interval Scheduling Problem
- Solved using greedy algorithms
- Weighted Interval Scheduling
- Similar to interval scheduling but with priorities
- Solved using dynamic programming
- Bipartite Matching
- Pairings and maximum matchings in bipartite graphs
- Network Flow
- Flow problems on graphs (e.g., Ford-Fulkerson)
- NP-Complete Problems
- Hard to solve efficiently
- Easy to verify solutions
- Example: Competitive Facility Location
- Find optimal spacing among facilities
Complexity Classes
- NP-Complete: Solutions are easy to verify but potentially hard to compute
- P-Space Complete: Hard to find a solution but easy to check once found
Algorithm Paradigms
Divide and Conquer
- Break problem into subproblems
- Solve recursively
- Combine solutions
Greedy Algorithms
- Construct solution step-by-step
- Always choose the best decision at the current step
- Hope that local optimum → global optimum
Dynamic Programming
- Consider all possible solutions, choose the best
- Look for repeated subproblems
- Two main approaches:
- Memoization: Top-down + recursion + caching
- Tabulation: Bottom-up + iterative + often array-based
When to use DP?
- Recursive solutions with repeated calculations
- Can transform recursion → memoization → tabulation
Graph Algorithms
- BFS (Breadth-First Search)
- DFS (Depth-First Search)
- Dijkstra’s Algorithm (shortest path)
- Topological Sort (for DAGs)
- Minimum Spanning Tree (MST)
- Prim’s Algorithm
- Kruskal’s Algorithm
Proof Techniques
- Lemma + Corollary + Subproblem + Algorithm format
- Inductive Reasoning:
- Base case
- Inductive step (assume n)
- Prove for n+1
CS 480 - Introduction to Artificial Intelligence
A combination of search, logic, learning, and reasoning. Examples and proofs help more than pure theory memorization.
Constraint Satisfaction Problems (CSPs)
- Identification problem: We care if a state is a goal state, not how we got there.
- Components:
- Variables: Set of objects that will take on values
- Domain: Set of all possible values for each variable
- Constraints: Restrictions on variable values with respect to other variables
- Complexity: Usually NP-hard
- Typical solver: Backtracking
Arc Consistency:
- Represent each undirected edge as two directed arcs
- Maintain arcs such that for every value in variable A, there’s a consistent value in variable B
- Can be thought of as an adjacency list
k-Consistency:
- Extends arc consistency to sets of k variables
Forward Checking:
- Eliminate inconsistent values ahead of time to reduce backtracking
Heuristics:
- Minimum Remaining Value (MRV): Choose the variable with the fewest legal values
- Least Constraining Value (LCV): Choose the value that eliminates the fewest choices for the neighboring variables
Tree-structured CSP:
- Can be solved efficiently using topological sort
Local Search for CSPs:
- Start from a random assignment
- Iteratively reassign values to minimize conflicts
- Constant space, but incomplete and suboptimal as the constraint-to-variable ratio increases
Example: Map Coloring
- Variables: {WA, NT, SA, Q, NSW, V, T}
- Domain: {Red, Green, Blue}
- Constraint: Adjacent regions must not share the same color
Example: N-Queens (4x4)
- Queens: Q1–Q4 each in different columns
- Place Q1 in row 2, Q2 in row 4, Q3 in row 1, Q4 in row 3 → No queen attacks another
Peak Finding
Hill Climbing:
- Always move towards increasing value
- May get stuck at local maxima
- Incomplete
Simulated Annealing:
- Occasionally allows downhill moves to escape local maxima
- Balances exploration and exploitation
Genetic Algorithm:
- Local beam search with crossover and mutation
- Explore multiple paths in parallel and favor the most promising
Search Problems
- World State: All information about the environment
- Search State: Minimal info needed for search algorithm
Uninformed Search
Algorithm | Complete | Optimal | Notes |
---|---|---|---|
Depth-First Search | No | No | Can get stuck in loops |
Breadth-First Search | Yes | Yes | If all edges have equal cost |
Uniform Cost Search | Yes | Yes | Explores lowest cost path first |
Iterative Deepening | Yes | Yes | Combines BFS + DFS |
Bidirectional Search | Yes | Yes | If both searches use UCS |
Informed Search
Algorithm | Complete | Optimal | Notes |
---|---|---|---|
Greedy Search | No | No | May get stuck in loops |
A* Search | Yes | Yes | Requires admissible and consistent heuristic |
- Good heuristic: Always underestimates cost to goal
Adversarial Search
Minimax — Step-by-Step Example
MAX
/ \
MIN MIN
/ \ / \
3 5 2 9
- Evaluate the left MIN:
- Children: 3, 5 → MIN = 3
- Evaluate the right MIN:
- Children: 2, 9 → MIN = 2
- MAX chooses the maximum of 3 and 2 → Best move = Left Subtree → Value = 3
Alpha-Beta Pruning Example (using above tree)
MAX
/ \
MIN MIN
/ \ / \
3 5 2 9
- Alpha = -inf, Beta = inf
- Left MIN: 3 (Beta = 3), then 5 → no pruning
- Right MIN: 2 (Beta = 2) < Alpha (3) → PRUNE 9
Expectimax
- MIN nodes replaced with average of possible outcomes
MAX
/ \
Chance Chance
/ \ / \
3 5 2 9
- Chance 1: (3+5)/2 = 4
- Chance 2: (2+9)/2 = 5.5
- MAX picks 5.5 → right branch
Sampling Methods in Bayes Nets
Bayes Net Example:
Cloudy → Rain
↘ ↓
Sprinkler → WetGrass
- P(Cloudy) = 0.5
-
P(Rain Cloudy) = 0.8 -
P(Sprinkler Cloudy) = 0.1 -
P(WetGrass Rain, Sprinkler) = 0.99
Prior Sampling
- Generate samples from root → leaves
- Sample: Cloudy = True → Rain = True → Sprinkler = False → WetGrass = True
- Repeat for 1000 samples, estimate P(WetGrass=True)
Rejection Sampling
- Generate prior samples, discard those not matching evidence (e.g., WetGrass=True)
Likelihood Weighting
- Fix evidence (e.g., WetGrass=True)
- Weight each sample based on probability of evidence given parents
Gibbs Sampling
- Initialize all variables randomly
- Repeatedly sample each non-evidence variable conditioned on Markov blanket
CS 455 - Networking
Internet Overview
- The Internet is a global computer network interconnecting billions of users.
- End systems: Devices like computers and servers.
- ISPs (Internet Service Providers): Provide access to the internet.
- Local ISP → Regional ISP → Tier 1 ISP
Packet Transmission
- Data sent as packets.
- Packets travel through links (fiber optics, coaxial cable, wireless) and packet switches.
Types of Network Delay
- Processing Delay: Time to examine packet header and check for errors.
- Queueing Delay: Time waiting in buffer before transmission.
- Transmission Delay: Time to push bits onto the link.
- Propagation Delay: Time to travel across the link.
Throughput = Your share of available bandwidth.
TCP/IP Model
Layer | Protocols | Notes |
---|---|---|
Application | HTTP, SMTP, FTP, DNS | End-to-end communication |
Transport | TCP, UDP | End-to-end delivery. TCP: reliable, UDP: fast |
Network | IP | Routing and forwarding between hosts |
Link | Ethernet, Wi-Fi | Transmission between nodes |
Physical | Coaxial, Fiber Optic | Signal transmission medium |
Encapsulation:
[Link Header [Network Header [Transport Header [Application Header [Data]]]]]
Network Security
- DoS Attack: Overwhelms target with traffic
- Packet Sniffing: Monitors data
- IP Spoofing: Fake IP addresses
- Mitigation: Endpoint authentication, encryption
Application Layer
Models
- Client-Server
- Peer-to-Peer (P2P)
Socket
- Interface for process-to-process communication
Key Considerations:
- Reliability (data guaranteed?)
- Throughput (rate of data)
- Loss Tolerance (okay to drop packets?)
- Timing (real-time required?)
- Security (data integrity/authentication?)
DNS (Domain Name System)
- Maps domain names to IP addresses
- Uses UDP
DNS Hierarchy:
Client → Local DNS Resolver
|
↓
[Root Server]
|
↓
[TLD Server]
|
↓
[Authoritative Server]
|
↓
Final IP Address
Search Types:
- Iterative: Resolver walks through each level
- Recursive: Resolver asks root, which queries each level
DNS Record Types:
- A → Hostname to IP
- NS → Domain to name server
- CNAME → Alias to canonical
- MX → Mail exchange
HTTP
- Runs on TCP, stateless (nonpersistent connection)
- Cookies: Track sessions
- Web Caches: Reduce latency and load
SMTP
- Transfers mail over TCP
- User Agent → Mail Server → Recipient Server
- Display requires POP3, IMAP, or HTTP
Transport Layer
Reliable Data Transfer
- Checksum: Detect bit errors
- ACK/NACK: Confirm or reject packets
- Sequence Numbers: Ensure correct ordering
- Timeouts: Retransmit if no ACK received
Pipelining
- Go-Back-N (GBN):
- Sliding window
- Cumulative ACK
- One timer per window
Sender Window Size = 4
Packets Sent: [0, 1, 2, 3] → Awaiting ACKs
ACK for 0 received → Slide window to [1, 2, 3, 4]
ACKs for 1 and 2 lost
ACK for 3 received → Discarded (cumulative ACK not reached)
Timeout for Packet 1 → Resend: 1, 2, 3, 4
- Selective Repeat:
- Individual ACKs
- Timer per packet
Sender Window Size = 4
Packets Sent: [0, 1, 2, 3]
ACKs received for 0 and 2
Timer expires only for packet 1 → Resend packet 1
Only missing packet is resent → Bandwidth-efficient
UDP
- Lightweight, best-effort delivery, optional checksum
TCP
- Reliable byte-stream service
- Combines Go-Back-N and Selective Repeat
- ACK = next expected byte
- Fast retransmit: After 3 duplicate ACKs
- Timeouts: Exponential weighted moving average
Sequence: [1000, 1001, 1002]
ACKs: ACK=1003 (means packets up to 1002 received)
Fast Retransmit:
Duplicate ACKs for 1001 ×3 → Immediate resend of 1001
TCP Flow Example:
Client Server
| SYN ---------------------> |
| |
| <------------------ SYN-ACK |
| ACK ---------------------> |
Congestion Control:
- Additive Increase, Multiplicative Decrease
- Slow Start: Exponential ramp-up
- Congestion Avoidance: Linear increase post-threshold
CWND (Congestion Window Size)
^
| /\
| / \
| / \
|_____/ \_______
Additive Multiplicative
Increase Decrease
Network Layer
IP Model
- Best-effort: No delivery guarantees
Planes
- Data Plane: Packet forwarding
- Control Plane: Path computation (routing)
DHCP (Dynamic Host Configuration Protocol)
- Assigns IP addresses dynamically
- Runs over UDP
DHCP Message Exchange:
- Discovery (Host → 255.255.255.255)
- Offer (Server)
- Request (Host)
- ACK (Server)
NAT (Network Address Translation)
- Private IP → Public IP + Port
- Reduces global IP usage
VPN
- Adds encrypted headers (tunneling)
IPv6 Transition
- Uses tunneling to wrap IPv6 in IPv4 headers
Devices
| Device | Function | |———-|——————————| | Router | Forwards packets (network) | | Switch | Forwards frames (link) | | Firewall | Permits or denies traffic | | NAT | IP translation for networks |
Routing Protocols
- Link-State:
- Flood costs to all
- Compute shortest paths (e.g., Dijkstra)
- Distance Vector:
- Bellman-Ford algorithm
- Slower convergence
Routing Domains
- BGP: Inter-AS (between networks)
- OSPF: Intra-AS (within networks)
Forwarding
- Uses forwarding tables
- Matches IP address prefix to interface
| Prefix | Interface |
|---------------|-----------|
| 192.168.1.0/24| eth0 |
| 10.0.0.0/8 | eth1 |
| 0.0.0.0/0 | eth2 | ← Default route
Packet Dest: 192.168.1.34 → eth0
Link Layer
- Transmits between adjacent nodes using MAC addresses
ARP (Address Resolution Protocol)
- IP → MAC address
- Uses broadcast to discover addresses
Channel Access Methods
Protocol | Description |
---|---|
FDMA | Divide channel by frequency |
TDMA | Divide channel by time |
CDMA | Assign unique codes to users |
Slotted ALOHA | Time slots; retry on collision |
Pure ALOHA | No time slots; even higher collision rate |
CSMA | Sense channel before sending |
CSMA/CD | Detect collision and stop transmission |
FDMA
User 1 → 0–1 MHz
User 2 → 1–2 MHz
User 3 → 2–3 MHz
Each gets a frequency band.
TDMA
Time Slot Assignment:
Slot 1: User A
Slot 2: User B
Slot 3: User C
Users transmit in their time window only.
CDMA
User 1: Code 1001
User 2: Code 1100
Both send at once, decoded at receiver using code correlation.
ALOHA
User 1 sends packet → Collision with User 2
Wait random time, retransmit
Pure ALOHA: Random anytime
Slotted ALOHA: Only at slot boundaries
CSMA/CD Example
User 1 senses channel → Idle → Transmit
Collision occurs with User 2
Both stop immediately
Wait random time → Retry
Switches vs Routers
Device | Layer | Purpose |
---|---|---|
Switch | Link | Forward & flood frames |
Router | Network | Forward packets |