vectoreyes/lib.rs
1#![deny(missing_docs)]
2#![allow(unsafe_op_in_unsafe_fn)]
3//! VectorEyes is a (almost entirely) safe and cross-platform wrapper library around vectorized
4//! operations.
5//!
6//! While a normal `add` CPU instruction will add two numbers together, a
7//! [SIMD/Vectorized](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) `add`
8//! instruction will perform multiple additions from the same instruction. This will amortize the
9//! per-instruction cost (e.g. of the CPU decoding the instruction) across all the additions of the
10//! single instruction. This can provide large speed boosts on many platforms.
11//!
12//! Unfortunately, using these operations require using per-platform unsafe intrinsics. To make
13//! this easier, VectorEyes provide safe functions which will function identically on all
14//! platforms.
15//!
16//! The core of this crate is vector types (such as [`U64x2`]). You can think of vectors as arrays
17//! with some extra SIMD operations on top.
18//!
19//! Just like arrays vectors have an element type ([`u64`] in the example above), and an element
20//! count, frequently referred to as _lanes_ (2 in the above example).
21//!
22//! In fact, you can freely convert between arrays and vectors!
23//!
24//! ```
25//! # use vectoreyes::*;
26//! // These two represent the same thing.
27//! let vector_form = U64x2::from([123_u64, 456_u64]);
28//! let array_form: [u64; 2] = vector_form.into();
29//! ```
30//!
31//! However, the vector form has _special SIMD powers_! These two functions perform the same
32//! operation, but the SIMD variant may[^may_be_faster] take better advantage of the CPU hardware.
33//!
34//! [^may_be_faster]: As always, only a Sith deals in absolutes. The Rust compiler can, in some
35//! cases, employ _autovectorization_ to compile code which doesn't use SIMD operations into code
36//! which uses SIMD instructions. Unfortunately, the compiler can't always autovectorize the way we
37//! want it to, which is why VectorEyes exists!
38//!
39//! While normal _bog-standard_ arrays don't implement the `+` operator, our vector types do!
40//! Adding two vectors together performs pairwise addition, using (for the vector backends) a
41//! single CPU instruction!
42//!
43//! ```
44//! # use vectoreyes::*;
45//! fn double_without_simd(arr: [u64; 2]) -> [u64; 2] {
46//! [arr[0] + arr[0], arr[1] + arr[1]]
47//! }
48//! fn double_with_simd(arr: U64x2) -> U64x2 {
49//! arr + arr
50//! }
51//! assert_eq!(
52//! U64x2::from(double_without_simd([1, 2])),
53//! double_with_simd(U64x2::from([1, 2])),
54//! );
55//! ```
56//!
57//! The documentation for every method on a vector (e.g. [`I64x2::and_not`]) lists the equivalent
58//! scalar code, as well as information on how the operation is implemented on each backend.
59//!
60//! # Vector Sizes
61//! There aren't vector types for every conceivable `(type, element count)` pair. Instead, we have
62//! vector types that correspond to the vector registers that many CPUs have. Because these
63//! registers are 128- or 256-bits wide, we choose vector types which also have this size. For
64//! example, there's a [`U64x2`] type and a [`U32x4`] type, since both are 128-bits wide. But
65//! there's no `U32x2` type, because that'd only be 64-bits wide.
66//!
67//!
68//! # Backends
69//! VectorEyes chooses what backend to execute vector operations with at compile-time.
70//!
71//! ## AVX2
72//! x86-64 CPUs that support the `AVX`, `AVX2`, `SSE4.1`, `AES`, `SSE4.2`, and
73//! `PCLMULQDQ` features will use the `AVX2` backend.
74//!
75//! ## Neon
76//! This is available on aarch64/arm64 machines with `neon` and `aes` features.
77//!
78//! ## Scalar
79//! This is a fallback implementation that works on all CPUs. It's not
80//! particularly performant.
81//!
82//! # Cargo Configuration
83//! If using VectorEyes from the `swanky` repo, all this configuration has already been done for
84//! you!
85//! ## Native CPU Setup
86//! Compile on the machine that you'll be running your code on, and add the
87//! following to your `.cargo/config` file:
88//! ```toml
89//! [build]
90//! rustflags = ["-C", "target-cpu=native", "--cfg=vectoreyes-target-cpu-native"]
91//! rustdocflags = ["-C", "target-cpu=native", "--cfg=vectoreyes-target-cpu-native"]
92//! ```
93//! ## Specific CPU Selection
94//! If you want to compile for some specific CPU, add the following to your
95//! `.cargo/config` file:
96//! ```toml
97//! [build]
98//! rustflags = ["-C", "target-cpu=TARGET", "--cfg=vectoreyes-target-cpu=\"TARGET\""]
99//! rustdocflags = ["-C", "target-cpu=TARGET", "--cfg=vectoreyes-target-cpu=\"TARGET\""]
100//! ```
101//! ## Maximal Compatibility
102//! If you do not put any of the above in your `.cargo/config` file,
103//! `vectoreyes` will always use its `scalar` backend, which does not use vector
104//! instructions.
105//!
106//! # Limitations
107//! VectorEyes was designed around the AVX2 backend. For example, shuffle operations tend to be
108//! constrained to 128-bit lanes because that's how the Intel intrinsics are constrained. As a
109//! result, while code that uses VectorEyes might be optimal for an Intel platform, it might not be
110//! optimal for an ARM platform with different intrinsics. (This is a limitation, generally, with
111//! cross-platform SIMD libraries like VectorEyes.)
112//!
113//! In addition, many SIMD intrinsics are currently not wrapped in VectorEyes.
114
115use std::ops::*;
116
117/// What backend will be used when targeting the current CPU?
118#[non_exhaustive]
119#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
120pub enum VectorBackend {
121 /// The fallback scalar backend (doesn't use vector instructions).
122 Scalar,
123 /// A vector backend targeting [AVX2](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Advanced_Vector_Extensions_2).
124 Avx2,
125 /// A vector backend targeting [ARM Neon](https://developer.arm.com/Architectures/Neon).
126 Neon,
127}
128
129/// The vector backend that this process is using.
130pub const VECTOR_BACKEND: VectorBackend = current_vector_backend();
131
132/// Panic if the current binary uses features unsupported by the current CPU.
133///
134/// `vectoreyes` uses compile-time flags to select which backend to use and which CPU features to
135/// require. If this backend is used on an unsupported CPU, it will result in an "Illegal
136/// instruction" error (technically, _all_ Rust code--not even just `vectoreyes` code--may result in
137/// undefined behavior if run on a CPU that doesn't support the compile-time selected feature
138/// flags).
139///
140/// It would be advisable to call this in the `main()` function of executables to try to catch
141/// these errors early.
142pub fn assert_cpu_features() {
143 vector_backend_check_cpu()
144}
145
146/// A scalar that can live in the lane of a vector.
147pub trait Scalar:
148 'static
149 + std::fmt::Debug
150 + num_traits::PrimInt
151 + num_traits::WrappingAdd
152 + num_traits::WrappingSub
153 + num_traits::WrappingMul
154 + subtle::ConstantTimeEq
155 + subtle::ConditionallySelectable
156{
157 /// A scalar of the same width as this scalar, but signed.
158 type Signed: Scalar;
159 /// A scalar of the same width as this scalar, but unsigned.
160 type Unsigned: Scalar;
161
162 /// A scalar of the same sign as this scalar, but with width 8.
163 type SameSign8: Scalar<Signed = i8, Unsigned = u8>;
164 /// A scalar of the same sign as this scalar, but with width 16.
165 type SameSign16: Scalar<Signed = i16, Unsigned = u16>;
166 /// A scalar of the same sign as this scalar, but with width 32.
167 type SameSign32: Scalar<Signed = i32, Unsigned = u32>;
168 /// A scalar of the same sign as this scalar, but with width 64.
169 type SameSign64: Scalar<Signed = i64, Unsigned = u64>;
170}
171macro_rules! scalar_impls {
172 ($(($s:ty, $u:ty)),*) => {$(
173 impl Scalar for $s {
174 type Signed = $s;
175 type Unsigned = $u;
176
177 type SameSign8 = i8;
178 type SameSign16 = i16;
179 type SameSign32 = i32;
180 type SameSign64 = i64;
181 }
182 impl Scalar for $u {
183 type Signed = $s;
184 type Unsigned = $u;
185
186 type SameSign8 = u8;
187 type SameSign16 = u16;
188 type SameSign32 = u32;
189 type SameSign64 = u64;
190 }
191 )*};
192}
193scalar_impls!((i64, u64), (i32, u32), (i16, u16), (i8, u8));
194/// A vector equivalent to `[T; Self::Lanes]`.
195///
196/// # Representation
197/// This type should have the same size as `[T; Self::Lanes]`, though it may have increased
198/// alignment requirements.
199///
200/// # Effects of signedness on shift operations
201/// When `Scalar` is _signed_, shift operations are signed shifts. When `Scalar` is _unsigned_,
202/// shift operations are unsigned shifts.
203///
204/// ## Example
205/// A signed shift right will add the sign bit
206/// ```
207/// # use vectoreyes::*;
208/// assert_eq!(
209/// U64x2::from([0xffffffffffffffff, 0x2]) >> 1,
210/// U64x2::from([0x7fffffffffffffff, 0x1]),
211/// );
212/// assert_eq!(
213/// // Because the sign bit of 0xffffffffffffffff is 1, shifting right will cause a 1 to be
214/// // inserted which, in this case, results in the same 0xffffffffffffffff value.
215/// U64x2::from(I64x2::from(U64x2::from([0xffffffffffffffff, 0x2])) >> 1),
216/// U64x2::from([0xffffffffffffffff, 0x1]),
217/// );
218/// ```
219pub trait SimdBase:
220 'static
221 + Sized
222 + Clone
223 + Copy
224 + Sync
225 + Send
226 + std::fmt::Debug
227 + PartialEq
228 + Eq
229 + Default
230 + bytemuck::Pod
231 + bytemuck::Zeroable
232 + BitXor
233 + BitXorAssign
234 + BitOr
235 + BitOrAssign
236 + BitAnd
237 + BitAndAssign
238 + AddAssign
239 + Add<Output = Self>
240 + SubAssign
241 + Sub<Output = Self>
242 + ShlAssign<u64>
243 + Shl<u64, Output = Self>
244 + ShrAssign<u64>
245 + Shr<u64, Output = Self>
246 + ShlAssign<Self>
247 + Shl<Self, Output = Self>
248 + ShrAssign<Self>
249 + Shr<Self, Output = Self>
250 + subtle::ConstantTimeEq
251 + subtle::ConditionallySelectable
252 + AsRef<[Self::Scalar]>
253 + AsMut<[Self::Scalar]>
254{
255 /// The number of elements of this vector.
256 const LANES: usize;
257
258 /// The equivalent array type of this vector.
259 type Array: 'static
260 + Sized
261 + Clone
262 + Copy
263 + Sync
264 + Send
265 + std::fmt::Debug
266 + bytemuck::Pod
267 + bytemuck::Zeroable
268 + PartialEq
269 + Eq
270 + Default
271 + std::hash::Hash
272 + AsRef<[Self::Scalar]>
273 + From<Self>
274 + Into<Self>;
275
276 /// The scalar that this value holds.
277 type Scalar: Scalar;
278 /// The signed version of this vector.
279 type Signed: SimdBase<Scalar = <<Self as SimdBase>::Scalar as Scalar>::Signed>
280 + From<Self>
281 + Into<Self>;
282 /// The unsigned version of this vector.
283 type Unsigned: SimdBase<Scalar = <<Self as SimdBase>::Scalar as Scalar>::Unsigned>
284 + From<Self>
285 + Into<Self>;
286
287 /// A vector where every element is zero.
288 const ZERO: Self;
289 /// Is `self == Self::ZERO`?
290 ///
291 /// # Example
292 /// ```
293 /// # use vectoreyes::*;
294 /// assert!(U32x4::from([0, 0, 0, 0]).is_zero());
295 /// assert!(!U32x4::from([1, 0, 0, 0]).is_zero());
296 /// ```
297 fn is_zero(&self) -> bool;
298
299 /// Create a new vector by setting element 0 to `value`, and the rest of the elements to `0`.
300 ///
301 /// # Example
302 /// ```
303 /// # use vectoreyes::*;
304 /// assert_eq!(U32x4::from([64, 0, 0, 0]), U32x4::set_lo(64));
305 /// ````
306 fn set_lo(value: Self::Scalar) -> Self;
307
308 /// Create a new vector by setting every element to `value`.
309 ///
310 /// # Example
311 /// ```
312 /// # use vectoreyes::*;
313 /// assert_eq!(U32x4::from([64, 64, 64, 64]), U32x4::broadcast(64));
314 /// ````
315 fn broadcast(value: Self::Scalar) -> Self;
316
317 /// A vector of `[Self::Scalar; 128 / (8 * std::mem::size_of::<Self::Scalar>())]`
318 type BroadcastLoInput: SimdBase<Scalar = Self::Scalar>;
319 /// Create a vector by setting every element to element 0 of `of`.
320 ///
321 /// # Example
322 /// ```
323 /// # use vectoreyes::*;
324 /// assert_eq!(U32x4::from([1, 1, 1, 1]), U32x4::broadcast_lo(U32x4::from([1, 2, 3, 4])));
325 /// ````
326 fn broadcast_lo(of: Self::BroadcastLoInput) -> Self;
327
328 /// Get the `I`-th element of this vector.
329 ///
330 /// # Example
331 /// ```
332 /// # use vectoreyes::*;
333 /// let v = U32x4::from([1, 2, 3, 4]);
334 /// assert_eq!(v.extract::<0>(), 1);
335 /// assert_eq!(v.extract::<1>(), 2);
336 /// assert_eq!(v.extract::<2>(), 3);
337 /// assert_eq!(v.extract::<3>(), 4);
338 /// ````
339 fn extract<const I: usize>(&self) -> Self::Scalar;
340
341 /// Convert the vector to an array.
342 #[inline(always)]
343 fn as_array(&self) -> Self::Array {
344 (*self).into()
345 }
346
347 /// Shift each element left by `BITS`.
348 ///
349 /// # Example
350 /// ```
351 /// # use vectoreyes::*;
352 /// assert_eq!(U32x4::from([1, 2, 3, 4]).shift_left::<1>(), U32x4::from([2, 4, 6, 8]));
353 /// ````
354 fn shift_left<const BITS: usize>(&self) -> Self;
355 /// Shift each element right by `BITS`.
356 /// # Effects of Signedness
357 /// When `T` is _signed_, this will shift in sign bits, as opposed to zeroes.
358 ///
359 /// # Example
360 /// ```
361 /// # use vectoreyes::*;
362 /// assert_eq!(U32x4::from([1, 2, 3, 4]).shift_right::<1>(), U32x4::from([0, 1, 1, 2]));
363 /// assert_eq!(I32x4::from([-1, -2, -3, -4]).shift_right::<1>(), I32x4::from([-1, -1, -2, -2]));
364 /// ````
365 fn shift_right<const BITS: usize>(&self) -> Self;
366
367 /// Compute `self & (! other)`.
368 ///
369 /// # Example
370 /// ```
371 /// # use vectoreyes::*;
372 /// assert_eq!(
373 /// U64x2::from([0b11, 0b00]).and_not(U64x2::from([0b10, 0b10])),
374 /// U64x2::from([0b01, 0b00]),
375 /// );
376 /// ````
377 fn and_not(&self, other: Self) -> Self;
378
379 /// Create a vector where each element is all 1's if the elements are equal, and all 0's otherwise.
380 ///
381 /// # Example
382 /// ```
383 /// # use vectoreyes::*;
384 /// assert_eq!(
385 /// U64x2::from([1, 2]).cmp_eq(U64x2::from([1, 3])),
386 /// U64x2::from([0xffffffffffffffff, 0]),
387 /// );
388 /// ````
389 fn cmp_eq(&self, other: Self) -> Self;
390 /// Create a vector where each element is all 1's if the element of `self` is greater than the
391 /// corresponding element of `other`, and all 0's otherwise.
392 ///
393 /// # Example
394 /// ```
395 /// # use vectoreyes::*;
396 /// assert_eq!(
397 /// U64x2::from([1, 28]).cmp_gt(U64x2::from([1, 3])),
398 /// U64x2::from([0, 0xffffffffffffffff]),
399 /// );
400 /// ````
401 fn cmp_gt(&self, other: Self) -> Self;
402
403 /// Interleave the elements of the low half of `self` and `other`.
404 ///
405 /// # Example
406 /// ```
407 /// # use vectoreyes::*;
408 /// assert_eq!(
409 /// U32x4::from([101, 102, 103, 104]).unpack_lo(U32x4::from([201, 202, 203, 204])),
410 /// U32x4::from([101, 201, 102, 202]),
411 /// );
412 /// ````
413 fn unpack_lo(&self, other: Self) -> Self;
414 /// Interleave the elements of the high half of `self` and `other`.
415 ///
416 /// # Example
417 /// ```
418 /// # use vectoreyes::*;
419 /// assert_eq!(
420 /// U32x4::from([101, 102, 103, 104]).unpack_hi(U32x4::from([201, 202, 203, 204])),
421 /// U32x4::from([103, 203, 104, 204]),
422 /// );
423 /// ````
424 fn unpack_hi(&self, other: Self) -> Self;
425
426 /// Make a vector consisting of the maximum elements of `self` and other.
427 ///
428 /// # Example
429 /// ```
430 /// # use vectoreyes::*;
431 /// assert_eq!(
432 /// U32x4::from([1, 2, 3, 4]).max(U32x4::from([0, 9, 0, 0])),
433 /// U32x4::from([1, 9, 3, 4]),
434 /// );
435 /// ````
436 fn max(&self, other: Self) -> Self;
437 /// Make a vector consisting of the minimum elements of `self` and other.
438 ///
439 /// # Example
440 /// ```
441 /// # use vectoreyes::*;
442 /// assert_eq!(
443 /// U32x4::from([1, 2, 3, 4]).min(U32x4::from([0, 9, 0, 0])),
444 /// U32x4::from([0, 2, 0, 0]),
445 /// );
446 /// ````
447 fn min(&self, other: Self) -> Self;
448}
449
450/// A vector supporting the gather operation (indexing into an array using indices from a vector).
451pub trait SimdBaseGatherable<IV: SimdBase>: SimdBase {
452 /// Construct a vector by accessing values at `base + indices[i]`.
453 ///
454 /// # Safety
455 /// This operation is safe if `std::ptr::read(base.add(indices[i]))` is safe for all `i`.
456 ///
457 /// # Example
458 /// ```
459 /// # use vectoreyes::*;
460 /// let arr: Vec<i32> = (0..=1024).map(|x| x + 1).collect();
461 /// let out = unsafe {
462 /// // SAFETY: All the indices are within bounds.
463 /// I32x4::gather(arr.as_ptr(), U64x4::from([32, 647, 827, 920]))
464 /// };
465 /// assert_eq!(out, I32x4::from([33, 648, 828, 921]));
466 /// ```
467 unsafe fn gather(base: *const Self::Scalar, indices: IV) -> Self;
468 /// Construct a vector by accessing values at `base + indices[i]`, if the mask's MSB is set.
469 /// Else return `src[i]`.
470 ///
471 /// # Safety
472 /// This operation is safe if `std::ptr::read(base.add(indices[i]))` is safe for all `i`.
473 ///
474 /// # Example
475 /// ```
476 /// # use vectoreyes::*;
477 /// let arr: Vec<i32> = (0..=1024).map(|x| x + 1).collect();
478 /// let out = unsafe {
479 /// // SAFETY: All the indices are within bounds.
480 /// I32x4::gather_masked(
481 /// arr.as_ptr(),
482 /// U64x4::from([32, 647, 827, 920]),
483 /// I32x4::from([-1, -1, 0, 0]),
484 /// I32x4::from([1, 2, 3, 4]),
485 /// )
486 /// };
487 /// assert_eq!(out, I32x4::from([33, 648, 3, 4]));
488 /// ```
489 unsafe fn gather_masked(base: *const Self::Scalar, indices: IV, mask: Self, src: Self) -> Self;
490}
491
492/// A vector containing 4 lanes.
493pub trait SimdBase4x: SimdBase {
494 /// If `Bi` is true, then that lane will be filled by `if_true`. Otherwise the lane
495 /// will be filled from `self`.
496 ///
497 /// # Example
498 /// ```
499 /// # use vectoreyes::*;
500 /// assert_eq!(
501 /// U64x4::from([11, 12, 13, 14])
502 /// .blend::<true, true, true, false>(U64x4::from([21, 22, 23, 24])),
503 /// U64x4::from([11, 22, 23, 24]),
504 /// );
505 /// ````
506 fn blend<const B3: bool, const B2: bool, const B1: bool, const B0: bool>(
507 &self,
508 if_true: Self,
509 ) -> Self;
510}
511
512/// A vector containing 8 lanes.
513pub trait SimdBase8x: SimdBase {
514 /// If `Bi` is true, then that lane will be filled by `if_true`. Otherwise the lane
515 /// will be filled from `self`.
516 ///
517 /// # Example
518 /// ```
519 /// # use vectoreyes::*;
520 /// assert_eq!(
521 /// U32x8::from([11, 12, 13, 14, 15, 16, 17, 18])
522 /// .blend::<true, true, true, false, false, true, true, false>(
523 /// U32x8::from([21, 22, 23, 24, 25, 26, 27, 28])),
524 /// U32x8::from([11, 22, 23, 14, 15, 26, 27, 28]),
525 /// );
526 /// ````
527 fn blend<
528 const B7: bool,
529 const B6: bool,
530 const B5: bool,
531 const B4: bool,
532 const B3: bool,
533 const B2: bool,
534 const B1: bool,
535 const B0: bool,
536 >(
537 &self,
538 if_true: Self,
539 ) -> Self;
540}
541
542/// A vector supporting saturating arithmetic on each entry.
543///
544/// Saturating operations clamp their outputs to the scalar's maximum or minimum value on
545/// overflow/underflow.
546pub trait SimdSaturatingArithmetic: SimdBase {
547 /// Pairwise add vectors. On overflow, the entry's value goes to the maximum scalar value.
548 fn saturating_add(&self, other: Self) -> Self;
549 /// Pairwise add vectors. On overflow, the entry's value goes to the minimum scalar value.
550 fn saturating_sub(&self, other: Self) -> Self;
551}
552
553/// A vector containing 8-bit values.
554pub trait SimdBase8: SimdBase + SimdSaturatingArithmetic
555where
556 Self::Scalar: Scalar<Unsigned = u8, Signed = i8>,
557{
558 /// Split the vector into groups of 16 bytes. Within each group, shift the _entire_ bytes left
559 /// by `AMOUNT`.
560 ///
561 /// # Example
562 /// ```
563 /// # use vectoreyes::*;
564 /// assert_eq!(
565 /// U8x16::from([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]).shift_bytes_left::<1>(),
566 /// U8x16::from([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]),
567 /// );
568 /// ```
569 fn shift_bytes_left<const AMOUNT: usize>(&self) -> Self;
570 /// Split the vector into groups of 16 bytes. Within each group, shift the _entire_ bytes right
571 /// by `AMOUNT`.
572 ///
573 /// # Example
574 /// ```
575 /// # use vectoreyes::*;
576 /// assert_eq!(
577 /// U8x16::from([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]).shift_bytes_right::<1>(),
578 /// U8x16::from([2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 0]),
579 /// );
580 /// ```
581 fn shift_bytes_right<const AMOUNT: usize>(&self) -> Self;
582 /// Get the sign/most significant bits of the elements of the vector.
583 ///
584 /// # Example
585 /// ```
586 /// # use vectoreyes::*;
587 /// assert_eq!(
588 /// (U8x16::from([0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1]) << 7).most_significant_bits(),
589 /// 0b1111001001010000,
590 /// );
591 /// ```
592 fn most_significant_bits(&self) -> u32;
593}
594
595/// A vector containing 16-bit values.
596pub trait SimdBase16: SimdBase + SimdSaturatingArithmetic
597where
598 Self::Scalar: Scalar<Unsigned = u16, Signed = i16>,
599{
600 /// Shuffle within the lower 64-bits of each 128-bit subvector.
601 ///
602 /// # Example
603 /// ```
604 /// # use vectoreyes::*;
605 /// assert_eq!(
606 /// U16x16::from([
607 /// 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
608 /// ]).shuffle_lo::<0, 1, 1, 3>(),
609 /// U16x16::from([
610 /// 3, 1, 1, 0, 4, 5, 6, 7, 11, 9, 9, 8, 12, 13, 14, 15
611 /// ]),
612 /// );
613 /// ```
614 fn shuffle_lo<const I3: usize, const I2: usize, const I1: usize, const I0: usize>(
615 &self,
616 ) -> Self;
617 /// Shuffle within the upper 64-bits of each 128-bit subvector.
618 ///
619 /// # Example
620 /// ```
621 /// # use vectoreyes::*;
622 /// assert_eq!(
623 /// U16x16::from([
624 /// 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
625 /// ]).shuffle_hi::<0, 1, 1, 3>(),
626 /// U16x16::from([
627 /// 0, 1, 2, 3, 7, 5, 5, 4, 8, 9, 10, 11, 15, 13, 13, 12
628 /// ]),
629 /// );
630 /// ```
631 fn shuffle_hi<const I3: usize, const I2: usize, const I1: usize, const I0: usize>(
632 &self,
633 ) -> Self;
634}
635
636/// A vector containing 32-bit values.
637pub trait SimdBase32: SimdBase
638where
639 Self::Scalar: Scalar<Unsigned = u32, Signed = i32>,
640{
641 /// Shuffle within 128-bit subvector.
642 ///
643 /// # Example
644 /// ```
645 /// # use vectoreyes::*;
646 /// assert_eq!(
647 /// U32x8::from([
648 /// 0, 1, 2, 3, 4, 5, 6, 7
649 /// ]).shuffle::<0, 1, 1, 3>(),
650 /// U32x8::from([
651 /// 3, 1, 1, 0, 7, 5, 5, 4
652 /// ]),
653 /// );
654 /// ```
655 fn shuffle<const I3: usize, const I2: usize, const I1: usize, const I0: usize>(&self) -> Self;
656}
657
658/// A vector containing 64-bit values.
659pub trait SimdBase64: SimdBase
660where
661 Self::Scalar: Scalar<Unsigned = u64, Signed = i64>,
662{
663 /// Zero out the upper-32 bits of each word, and then perform pairwise multiplication.
664 ///
665 /// # Example
666 /// ```
667 /// # use vectoreyes::*;
668 /// assert_eq!(
669 /// U64x4::from([6, 7, 8, 9]).mul_lo(U64x4::from([1, 2, 3, 4])),
670 /// U64x4::from([6, 14, 24, 36]),
671 /// );
672 /// assert_eq!(
673 /// U64x4::from([6, 7, 8, 9]).mul_lo(
674 /// U64x4::from([1, 2, 3, 4]) | U64x4::broadcast(u64::MAX << 32)
675 /// ),
676 /// U64x4::from([6, 14, 24, 36]),
677 /// );
678 /// ```
679 fn mul_lo(&self, other: Self) -> Self;
680}
681
682/// A vector containing 4 64-bit values.
683pub trait SimdBase4x64: SimdBase64 + SimdBase4x
684where
685 Self::Scalar: Scalar<Unsigned = u64, Signed = i64>,
686{
687 /// Shuffle the 64-bit values.
688 ///
689 /// # Example
690 /// ```
691 /// # use vectoreyes::*;
692 /// assert_eq!(
693 /// U64x4::from([0, 1, 2, 3]).shuffle::<0, 1, 1, 3>(),
694 /// U64x4::from([3, 1, 1, 0]),
695 /// );
696 /// ```
697 fn shuffle<const I3: usize, const I2: usize, const I1: usize, const I0: usize>(&self) -> Self;
698}
699
700// TODO: deprecate the uses of from() everywhere and use traits/functions that make it obvious which
701// casts are free and which aren't.
702
703/// Lossily cast a vector by {zero,sign}-extending its values.
704pub trait ExtendingCast<T: SimdBase>: SimdBase {
705 /// Cast from one vector to another by sign or zero extending the values from the source until it
706 /// fills the destination.
707 ///
708 /// The lowest-index values in `t` are kept. Any values which don't fit are discarded.
709 ///
710 /// # Example
711 /// ```
712 /// # use vectoreyes::*;
713 /// assert_eq!(
714 /// U64x2::extending_cast_from(U32x4::from([1, 2, 3, 4])),
715 /// U64x2::from([1, 2]),
716 /// );
717 /// ```
718 fn extending_cast_from(t: T) -> Self;
719}
720
721/// A [`Scalar`] type which has a vector type of length `N`.
722///
723/// See [`Simd`] for how this trait is used.
724pub trait HasVector<const N: usize>: Scalar {
725 /// The vector of `[Self; N]`.
726 type Vector: SimdBase<Scalar = Self>;
727}
728
729/// An alternative way of naming SIMD types.
730///
731/// This allows for functions to be written which are generic in the type or length of a vector.
732///
733/// # Example
734/// ```
735/// # use vectoreyes::*;
736/// type MyVector = Simd<u8, 16>; // The same as U8x16.
737///
738/// fn my_length_generic_code<const N: usize>(x: Simd<u32, N>, y: Simd<u32, N>) -> Simd<u32, N>
739/// where u32: HasVector<N>
740/// {
741/// x + x + y
742/// }
743/// ```
744pub type Simd<T, const N: usize> = <T as HasVector<N>>::Vector;
745
746/// An AES block cipher, suitable for encryption.
747///
748/// This cipher can be used for encryption. Decryption operations are handled in the subtrait
749/// [`AesBlockCipherDecrypt`].
750pub trait AesBlockCipher: 'static + Clone + Sync + Send {
751 /// The type of the AES key.
752 type Key: 'static + Clone + Sync + Send;
753
754 /// Running `encrypt_many` with this many blocks will typically result in good
755 /// performance.
756 const BLOCK_COUNT_HINT: usize;
757
758 /// Run the AES key schedule operation with a given key.
759 fn new_with_key(key: Self::Key) -> Self;
760
761 /// Encrypt a single 128-bit AES block.
762 #[inline(always)]
763 fn encrypt(&self, block: U8x16) -> U8x16 {
764 self.encrypt_many([block])[0]
765 }
766 /// Encrypt an array of `N` 128-bit AES blocks using ECB mode.
767 fn encrypt_many<const N: usize>(&self, blocks: [U8x16; N]) -> [U8x16; N]
768 where
769 array_utils::ArrayUnrolledOps: array_utils::UnrollableArraySize<N>;
770}
771
772/// An AES block cipher, suitable for encryption and decryption.
773pub trait AesBlockCipherDecrypt: AesBlockCipher {
774 /// Decrypt a single 128-bit AES block.
775 #[inline(always)]
776 fn decrypt(&self, block: U8x16) -> U8x16 {
777 self.decrypt_many([block])[0]
778 }
779 /// Decrypt an array of `N` 128-bit AES blocks using ECB mode.
780 fn decrypt_many<const N: usize>(&self, blocks: [U8x16; N]) -> [U8x16; N]
781 where
782 array_utils::ArrayUnrolledOps: array_utils::UnrollableArraySize<N>;
783}
784
785pub mod array_utils;
786pub(crate) mod utils;
787
788// We want to allow `which_lane * 0 + 0` expressions.
789// These also allow for simpler generated code. For example, sometimes we have code which looks
790// like:
791// let x: {{ty}};
792// x as u8
793// When {{ty}} _is_ u8, this cast isn't neccessary. But it's simpler to always insert it in the
794// generated code.
795#[allow(
796 clippy::identity_op,
797 clippy::erasing_op,
798 clippy::unnecessary_cast,
799 clippy::useless_conversion
800)]
801// intel intrinsics have many arguments
802#[allow(clippy::too_many_arguments)]
803// our compressed code doesn't have newlines
804#[allow(clippy::suspicious_else_formatting)]
805// You can't put inline(always) without a closure
806#[allow(clippy::redundant_closure)]
807// These two lints let us have extra parentheses in the generated source (which makes generation
808// easier).
809#[allow(unused_parens)]
810#[allow(clippy::needless_borrow)]
811// </the two lints>
812mod generated;
813pub use generated::implementation::*;