primesieve uses the segmented sieve of Eratosthenes with wheel factorization, this algorithm has a complexity of operations and uses space.
Segmentation is currently the best known practical improvement to the sieve of Eratosthenes. Instead of sieving the interval [2, n] at once one subdivides the sieve interval into a number of equal sized segments that are then sieved consecutively. Segmentation drops the memory requirement of the sieve of Eratosthenes from to . The segment size is usually chosen to fit into the CPU's fast L1 or L2 cache memory which significantly speeds up sieving. A segmented version of the sieve of Eratosthenes was first published by Singleton in 1969 , Bays and Hudson in  describe the algorithm in more detail.
Wheel factorization is used to skip multiples of small primes. If a kth wheel is added to the sieve of Eratosthenes then only those multiples are crossed off that are coprime to the first k primes, i.e. multiples that are divisible by any of the first k primes are skipped. The 1st wheel considers only odd numbers, the 2nd wheel (modulo 6) skips multiples of 2 and 3, the 3rd wheel (modulo 30) skips multiples of 2, 3, 5 and so on. Pritchard has shown in  that the running time of the sieve of Eratosthenes can be reduced by a factor of if the wheel size is but for cache reasons the sieve of Eratosthenes usually performs best with a modulo 30 or 210 wheel. Sorenson explains wheels in .
Additionally primesieve uses Tomás Oliveira e Silva's cache-friendly bucket list algorithm if needed . This algorithm is relatively new it has been devised by Tomás Oliveira e Silva in 2001 in order to speed up the segmented sieve of Eratosthenes for prime numbers past 32 bits. The idea is to store the sieving primes into lists of buckets with each list being associated with a segment. A list of sieving primes related to a specific segment contains only those primes that have multiple occurrence(s) in that segment. Whilst sieving a segment only the primes of the related list are used for sieving and each prime is reassigned to the list responsible for its next multiple when processed. The benefit of this approach is that it is now possible to use segments (i.e. sieve arrays) smaller than without deteriorating efficiency, this is important as only small segments that fit into the CPU's L1 or L2 cache provide fast memory access.
primesieve is written entirely in C++ and does not depend on external libraries , it compiles with every standard compliant C++ compiler. Its speed is mainly due to the segmentation of the sieve of Eratosthenes which prevents cache misses when crossing off multiples in the sieve array and the use of a bit array instead of the more widely used byte (boolean) array. These are the optimizations I use in my implementation:
Uses a bit array with 30 numbers per byte for sieving
Pre-sieves multiples of small primes ? 19
Starts crossing off multiples at the square
Uses a modolo 210 wheel that skips multiples of 2, 3, 5 and 7
Uses specialized algorithms for small, medium and big sieving primes
Processes multiple sieving primes per loop iteration to increase instruction-level parallelism
Parallelized (multi-threaded) using OpenMP
To browse the latest primesieve source code online visit the 'Source' tab.
DOWNLOAD primesieve - 5.4 / Windows For Free