• Docs >
  • TBE CPU Autovectorization
Shortcuts

TBE CPU Autovectorization

FP8/16/32 Autovec Implementation Methods

template<typename InType = std::uint8_t, typename IndexType = std::int64_t, typename OffsetType = std::int32_t, typename OutType = float>
bool EmbeddingSpMDM_autovec(const std::int64_t block_size, const std::int64_t output_size, const std::int64_t index_size, const std::int64_t data_size, const InType *input, const IndexType *indices, const OffsetType *offsets_or_lengths, const float *weights, bool normalize_by_lengths, OutType *out, bool is_weight_positional = false, bool use_offsets = true, std::int64_t output_stride = -1, std::int64_t input_stride = -1, bool scale_bias_last = true, bool no_bag = false, bool is_bf16_out = false, bool is_bf16_in = false)

Autovectorized version of method EmbeddingSpMDM_ref for FP32 weight type.

Template Parameters:
  • InType – input data type (uint8_t is used)

  • IndexType – index data type (int64_t is used)

  • OffsetType – offset data type (int32_t is used)

  • OutType – output data type (float is used)

Parameters:
  • block_size – Number of elements in a block (int64_t)

  • output_size – Number of elements in output (int64_t)

  • index_size – Number of elements in index (int64_t)

  • data_size – Number of elements in data (int64_t)

  • input – Address of input (InType*)

  • indices – Address of index (IndexType*)

  • offsets_or_lengths – Address of offset (OffsetType*)

  • weights – Weights of sum; optional, can be null for non-weighted sum (float*)

  • normalize_by_lengths – Whether or not to normalize by lengths (bool)

  • out – Address of output (OutType*)

  • is_weight_positional – If true, weight is positional; set to false for FP32 autovec implementation (bool)

  • use_offsets – If true, will use offsets instead of lengths; set to true for FP32 autovec implementation (bool)

  • output_stride – If -1, output_stride is same as block_size; set to -1 for FP32 autovec implementation (int64_t)

  • input_stride – If -1, input_stride is same as block_size; set to -1 for FP32 autovec implementation (int64_t)

  • scale_bias_last – If true, scale and bias appear at end of each row; set to true for FP32 autovec implementation (bool)

  • no_bag – If true, no embedding bag; set to false for FP32 autovec implementation (bool)

  • is_bf16_out – If true, output is BFLOAT16 type; set to false for FP32 autovec implementation (bool)

  • is_bf16_in – If true, input is BFLOAT16 type; set to false for FP32 autovec implementation (bool)

template<typename IndexType, typename OffsetType, typename OutType>
bool EmbeddingSpMDMFP8_autovec(const int64_t block_size, const int64_t output_size, const int64_t index_size, const int64_t data_size, const uint8_t *input, const IndexType *indices, const OffsetType *offsets_or_lengths, const float *weights, bool normalize_by_lengths, OutType *out, bool is_weight_positional, bool use_offsets, int64_t output_stride, int64_t input_stride, int exponent_bits, int exponent_bias, bool is_bf16_out)

Autovectorized version of method EmbeddingSpMDM_ref for FP8 weight type.

Template Parameters:
  • InType – input data type (uint8_t is used)

  • IndexType – index data type (int64_t is used)

  • OffsetType – offset data type (int32_t is used)

  • OutType – output data type (float is used)

Parameters:
  • block_size – Number of elements in a block (int64_t)

  • output_size – Number of elements in output (int64_t)

  • index_size – Number of elements in index (int64_t)

  • data_size – Number of elements in data (int64_t)

  • input – Address of input (InType*)

  • indices – Address of index (IndexType*)

  • offsets_or_lengths – Address of offset (OffsetType*)

  • weights – Weights of sum; optional, can be null for non-weighted sum (float*)

  • normalize_by_lengths – Whether or not to normalize by lengths (bool)

  • out – Address of output (OutType*)

  • is_weight_positional – If true, weight is positional; set to false for FP8 autovec implementation (bool)

  • use_offsets – If true, will use offsets instead of lengths; set to true for FP8 autovec implementation (bool)

  • output_stride – If -1, output_stride is same as block_size; set to -1 for FP8 autovec implementation (int64_t)

  • exponent_bits – Bits to use in exponent

  • exponent_bias – Bias to use in exponent

  • is_bf16_out – If true, output is BFLOAT16 type; set to false for FP8 autovec implementation (bool)

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources