Y. Tseng, R. F. DeMara, and P. Wilder, "Distributed-Sum Termination Detection Supporting Multithreaded Execution," Parallel Computing, Vol. 29, No. 7, July, 2003, pp. 953 - 968. Abstract A fast, wire-efficient synchronization technique is developed that supports dynamic allocation of multiple threads on shared-memory, message-passing, and/or single-chip multiprocessors. The proposed distributed-sum bit-comparison (DSBC) method employs the execution-sequence invariant property such that the instantaneous process production equals the instantaneous process consumption only upon barrier completion. For a system of n processing elements (PEs), a single instance of the global logic unit, and n instances of the local logic unit, interconnected by 3n wires, are shown to provide direct support for any arbitrary number of barriers. The barrier detection time is shown to scale linearly in terms of the number of active barriers in the system. Comparisons to Wired-NOR hardware and Shared-Lock software approaches indicate reduced barrier detection time, decreased inter-PE wiring requirements, and increased functionality. Suitability of adaptation of the DSBC method to a skew- insensitive clockless design is also discussed. Complete Paper Available at: http://www.cal.ucf.edu/publications/journal/J21_tseng_etal_PC.pdf