SIMT Architecture

multiprocessor creates, manages, schedules, and executre threads in groups of 32 called warps

individual threads composing a warp start together at the same progream address, but they have their own instrusction address counter and register state and are therefore free to branch and executre independently

when a multiprocesor is given one or more threads blocks to execute ,it partitions them into warps and each warp gets scheduled by a warp scheduler for exectuion

warp 1 has threads 0-31, warp 2 has threads 32-63...

----------------------

"with Independent Thread Scheduling, the GPI maintains execution state per thread, including a program counter and call stack, and can yield execution at a per-thread granularity, either to make better use of execution resources or to allow one thread to wait for data to be produced by another"

cant tell if the "one thread can wait for data produced by another" is a good thing or not. seems useful if you have a single data point and can produce 32 sequential steps given a single warp, but also means no parallel, so not good?

wait if you have 4 data points, and you need to perform 32 steps to all 4 pieces, instead of putting all 4 in, doing op, writing output, then reading again, op, write, ... resulting in 32 reads and writes. you could use 4 warps, with 1 data each, run the 32 steps with the 32 threads per warp, and all done in a single read and write?

nevermind this is wrong it would still be single read write with 4 data on 4 threads in single warp because the same instruction is being applied and no data needs to be written in between the initial read and final write

----------------------------------------------------