Having said that, we all know that a processor that has more threads than cores is capable of performing more tasks simultaneously, and in fact the operating system detects the processor as if it actually has that many cores. that there were threads. For example, an Intel Core i7-8700K has 6 cores and 12 threads thanks to HyperThreading technology, and Windows 10 recognizes it as a 12-core processor as is (although it is true that it calls them “logical processors”) because for the operating system, its operation is completely transparent.
What is multi-threaded processing?
In computer architecture, multi-threaded processing is the ability of the central processing unit (CPU) to provide multiple threads of execution at the same time, supported by the operating system. This approach differs from multiprocessing and should not be confused; In a multithreaded application, threads share the resources of one or more processor cores, including compute units, cache, and translation search buffer (TLBL).
When multiprocessing systems include multiple complete processing units on one or more cores, multiprocessing aims to increase the utilization of a single core by using thread-level parallelism as well as instruction-level parallelism. Since the two techniques are complementary, they are combined in almost all modern system architectures with multiple multi-threaded processors and with multi-core processors capable of operating with multiple threads.
The multithreaded paradigm became more popular as efforts to exploit instruction-level parallelism (i.e. being able to execute multiple instructions in parallel) were stalled in the late 1990s. allowed the concept of performance computing to emerge from the more specialized field of current transaction.
Although it is very difficult to further speed up a single thread or program, most computer systems are in fact multitasking between multiple threads or programs and therefore techniques that improve the performance of all tasks result in gains. performance. In other words, the more instructions a processor can process at the same time, the better the overall performance of the entire system.
Even multi-threaded processing has drawbacks
Aside from the performance gains, one of the benefits of multi-threaded processing is that if one thread has a lot of cache errors, other threads can continue to take advantage of unused CPU resources, which can lead to overall execution. faster. resources would have been idle if only one thread was running. Additionally, if a thread cannot use all of the CPU’s resources (for example, because the instructions depend on the outcome of the previous one), running another thread can prevent those resources from becoming idle.
However, everything also has its negative side. Multiple threads can interfere with each other by sharing hardware resources, such as translation cache or search buffers. Therefore, single-threaded execution times are not improved and may degrade even when a single thread is running, due to lower frequencies or additional pipeline stages required to accommodate process switching hardware.
Overall effectiveness varies; Intel claims that its HyperThreading technology improves it by 30%, while a synthetic program that performs only one cycle of non-optimized and dependent floating-point operations actually receives a 100% improvement when it is run in parallel. On the other hand, manually tuned assembly language programs that use MMX or AltiVec extensions and pre-search for data (like a video encoder) don’t suffer from cache leaks or idle resources, so they don’t benefit. not at all from an execution. . multi-threaded and may in fact see their performance degraded due to a sharing conflict.
From a software perspective, support for multithreaded hardware is fully visible, requiring additional modifications to both application programs and the operating system itself. The hardware techniques used to support multithreaded processing often parallel the software techniques used for multitasking; Thread scheduling is also a major problem in multithreading.
Types of multi-threaded processing
As we said at the beginning, we all have the idea that multi-threaded processing is just a parallelization of processes (i.e. running multiple tasks at the same time), but in reality things are a little more complicated than that and there are different types of multi-threaded processing.
Several “ coarse ” threads
The simplest type of multithreading occurs when a thread runs until it is blocked by an event that would normally create a long latency lock. Such a crash could be a lack of cache that needs to access off-chip memory, which can take hundreds of CPU cycles for data to return. Instead of waiting for the crash to resolve, the processor will switch execution to another thread that was already ready to be executed, and it is only when the data from the previous thread has arrived that it will be put back into the list. threads ready to run.
Conceptually, this is similar to cooperative multitasking used in real-time operating systems, in which tasks voluntarily give up CPU execution time when they have to wait for an event to occur. This type of multithreading is called “bulk” or “coarse”.
The purpose of this type of multithreaded processing is to remove all data dependency locks from the execution pipeline. Since one thread is relatively independent of the others, there is less chance that an instruction in a pipeline stage needs output from a previous instruction in the same channel; Conceptually, this is similar to preventative multitasking used in the operating system, and an analogy would be that the time interval given to each active thread is one CPU cycle.
Of course, this type of multi-threaded processing has a major drawback, that each pipeline stage has to follow the thread ID of the instruction it is processing, which slows down its performance. Also, since there are more threads running at the same time in the pipeline, the shares such as the cache must be larger to avoid errors.
The most advanced type of multithreading applies to processors called superscalar. While a typical superscalar processor issues multiple instructions from a single thread on each processor cycle, in simultaneous multithreaded processing (SMT), a superscalar processor can issue instructions from multiple threads on each cycle. Recognizing that any thread has a limited amount of statement-level parallelism, these multithreading attempts to exploit the parallelism available across multiple threads to reduce the waste associated with unused space.
To distinguish other types of SMT multithreaded processing, the term “temporary multithreaded” is often used to indicate when single-threaded instructions can be issued at the same time. Such implementations include DEC, EV8, Intel’s HyperThreading technology, IBM Power5, Sun Mycrosystems UltraSPARC T2, Cray XMT, and AMD’s Bulldozer and Zen microarchitectures.