Intrinsic Functions

The C and C++ intrinsic functions either allow for direct access to some hardware instructions or result in generation of inline code to perform some specialized functions. These intrinsic functions are processed completely by the compiler. In many cases, the generated code is one or two instructions. These are called functions because they are invoked with the syntax of function calls.

The C and C++ intrinsic functions either allow for direct access to some hardware instructions or result in generation of inline code to perform some specialized functions. These intrinsic functions are processed completely by the compiler. In many cases, the generated code is one or two instructions. These are called functions because they are invoked with the syntax of function calls.

To get access to most of the intrinsic functions, the Cray C++ compiler requires that either the intrinsics.h file be included or that the intrinsic functions being called are explicitly declared. If the source code does not have an intrinsics.h statement and the code cannot be modified, use the -h prototype_intrinsics option instead. If an intrinsic function is intrinsically declared, the declaration must agree with the documentation or the compiler treats the call as a call to a normal function, not the intrinsic function. The -h nointrinsics command line option causes the compiler to treat these calls as regular function calls and not as intrinsic function calls.

There are built-in atomic memory intrinsic functions of the form __sync_* that do not require an include file nor any explicit declaration.

The types of the arguments to intrinsic functions are checked by the compiler, and if any of the arguments do not have the correct type, a warning message is issued and the call is treated as a normal call to an external function. If the intention was to call an external function with the same name as an intrinsic function, change the external function name. The names used for the Cray C intrinsic functions are in the name space reserved for the implementation.

Several of these intrinsic functions have both a vector and a scalar version. If a vector version of an intrinsic function exists and the intrinsic is called within a vectorized loop, the compiler uses the vector version of the intrinsic. For details on whether it has a vector version, refer to the appropriate intrinsic function man page.

Atomic Memory Operations

Atomic memory operations (AMOs), unlike other functions, cannot be interrupted by the system and can allow multiple threads to safely modify the same variable under certain conditions. The AMO intrinsics allow for the adding, subtracting, AND, NAND, OR, and XOR values together, or comparing and swapping values.

Local AMOs operate on variables in the processor's local memory (cache domain); they do not use the network interface to access memory. Multiple threads using local atomic memory operations to access the same variable need to be running within the same processor cache domain, which implies that they must be running on the same node. Local AMOs are atomic with respect to each other. The compiler issues an error message if a user tries to apply a local AMO intrinsic to a Unified Parallel C or shared variable or Fortran coarray that is not local to the current thread.

Global AMOs use the network interface to access variables in memory. The variables may or may not be in the processor's local cache domain. Global AMOs are atomic with respect to each other. Global AMOs are used to modify a Unified Parallel C (UPC) shared variable or Fortran coarray and are available only when compiling UPC (-hupc) or coarray Fortran (-hcaf).

A global AMO uses a different mechanism for achieving atomicity than a local AMO, so local and global AMOs are not atomic with respect to each other. Global and local AMOs should not be used concurrently on the same memory location, without synchronization.

It is possible to safely modify a variable using both atomic and non-atomic operations within a single UPC thread or Fortran image; however, if a thread or image modifies a variable with an atomic operation and a different thread or image concurrently modifies the same variable with a non-atomic operation, the result is indeterminate.

Global atomic memory operations (global AMO) are typically used to atomically modify a Unified Parallel C (UPC) shared variable or Fortran coarray. The target of a global AMO can be located in a different cache domain, so a global AMO is not atomic with respect to memory operations performed locally within the target's cache domain. Therefore, the application must use synchronization to ensure that global AMOs and local memory operations are not used concurrently on the same memory location.

For synopses of AMO functions, see the amo(3i) man page.

Local Atomic Memory Operations

The following functions, defined in intrinsics.h, perform various local atomic memory operations:
__builtin_ia32_lfence
(Load fence) Insures that all memory loads issued before this intrinsic are visible in memory before any future loads are executed.
__builtin_ia32_sfence
(Store fence) Insures that all memory stores issued before this intrinsic are visible in memory before any future stores are executed.
__builtin_ia32_mfence
(Memory fence) Insures that all memory stores and loads issued before this intrinsic are visible in memory before any future stores or loads are executed.

Functions built into the compiler do not require an include file, nor a specific compilation option for use. The following local atomic, built-in functions return the value of the object before the named operation occurs.

In this discussion, an object is an entity that is referred to by a pointer. A value is an actual number, bit mask, etc. that is not referred to by a pointer. The allowed object and value types are signed and unsigned integer types of 1, 2, 4, or 8 bytes.
  • The __sync_fetch_and_add function fetches the object pointed to by ptr, adds value, places the result into the object pointed to by ptr, and returns the old value of the object pointed to by ptr.
  • The __sync_fetch_and_sub function fetches the object pointed to by ptr, subtracts value, places the result into the object pointed to by ptr, and returns the old value of the object pointed to by ptr.
  • The __sync_fetch_and_or function fetches the object pointed to by ptr, ORs value, places the result into the object pointed to by ptr, and returns the old value of the object pointed to by ptr.
  • The __sync_fetch_and_and function fetches the object pointed to by ptr, ANDs value, places the result into the object pointed to by ptr, and returns the old value of the object pointed to by ptr.
  • The __sync_fetch_and_xor function fetches the object pointed to by *ptr, XORs value, places the result into the object pointed to by ptr, and returns the old value of the object pointed to by ptr.
  • The __sync_fetch_and_nand function fetches the object pointed to by ptr, NANDs value, places the result into the object pointed to by ptr, and returns the old value of the object pointed to by ptr.
The following local atomic, built-in functions return the value of the object after the named operation occurs:
  • The __sync_add_and_fetch function adds value to the object pointed to by ptr and returns the new value of the object pointed to by ptr.
  • The __sync_sub_and_fetch function subtracts value from the object pointed to by ptr and returns the new value of the object pointed to by ptr.
  • The __sync_or_and_fetch function ORs value with the object pointed to by ptr and returns the new value of the object pointed to by ptr.
  • The __sync_and_and_fetch function ANDs value with the object pointed to by ptr and returns the new value of the object pointed to by ptr.
  • The __sync_xor_and_fetch function XORs value with the object pointed to by ptr and returns the new value of the object pointed to by ptr.
  • The __sync_nand_and_fetch function NANDs value with the current value of ptr and returns the new contents of ptr.
  • The __sync_val_compare_and_swap function performs an atomic compare and swap. If the current value of *ptr is compareValue, then write replacementValue into *ptr and return the contents of *ptr before the operation.
  • The __sync_lock_test_and_set function writes value into *ptr, and returns the previous contents of *ptr.

Global Atomic Memory Operations

Global atomic memory operations (global AMO) are typically used to atomically modify a Unified Parallel C (UPC) shared variable or Fortran coarray.

The target of a global AMO can be located in a different cache domain, so a global AMO is not atomic with respect to memory operations performed locally within the target's cache domain. Therefore, the application must use synchronization to ensure that global AMOs and local memory operations are not used concurrently on the same memory location.

The following intrinsics are defined in intrinsics.h. Functions without the _upc suffix accept both shared and non-shared pointers as the first argument. Functions with the _upc suffix accept only shared pointers as the first argument.

In this discussion, an object is an entity that is referred to by a pointer. A value is an actual number, bit mask, etc. that is not referred to by a pointer.
  • The _amo_aadd and _amo_aadd_upc functions (atomic add) add value to the object pointed to by ptr.
  • The _amo_aaddf and _amo_aaddf functions (atomic add and fetch) add value to the object pointed to by ptr and return the new value.
  • The _amo_afadd and _amo_afadd_upc functions (atomic fetch and add) add value to the object pointed to by ptr and return the old value of the object.
  • The _amo_aax and _amo_aax_upc functions (atomic AND and XOR) AND the object pointed to by ptr with andMask, XOR the result with xorMask, and place the result into the object.
  • The _amo_afax and _amo_afax_upc functions (atomic fetch and AND and XOR) AND the object pointed to by ptr with andMask, XOR the result with xorMask, place the result into the object, and return the old value of the object.
  • The _amo_aandf and _amo_aandf_upc functions (atomic AND and fetch) AND the object pointed to by ptr with value, place the result into the object, and return the new value of the object.
  • The _amo_afand and _amo_afand_upc functions (atomic fetch and AND) AND the object pointed to by ptr with value, place the result into the object, and return the old value of the object.
  • The _amo_anandf and _amo_anandf_upc functions (atomic NAND and fetch) NAND the object pointed to by ptr with value, place the result into the object, and return the new value of the object.
  • The _amo_afnand and _amo_afnand_upc functions (atomic fetch and NAND) NAND the object pointed to by ptr with value, place the result into the object, and return the old value of the object.
  • The _amo_aorf and _amo_aorf_upc functions (atomic OR and fetch) OR the object pointed to by ptr with value, place the result into the object, and return the new value of the object.
  • The _amo_afor and _amo_afor_upc functions (atomic fetch and OR) OR the object pointed to by ptr with value, place the result into the object, and return the old value of the object.
  • The _amo_axorf and _amo_axorf_upc functions (atomic XOR and fetch) XOR the object pointed to by ptr with value, place the result into the object, and return the new value of the object.
  • The _amo_afxor and _amo_afxor_upc functions (atomic fetch and XOR) XOR the object pointed to by ptr with value, place the result into the object, and returns the old value of the object.
  • The _amo_acswap and _amo_acswap_upc functions (atomic compare and swap) compare and swap a value by replacing the contents of the object pointed to by ptr with replacementValue if compareValue is equal to the object pointed to by ptr and return the old value of the object.
  • The _amo_aswap and _amo_aswap_upc functions (atomic swap) swap a value by replacing the contents of the object pointed to by ptr with replacementValue. This function always returns the old value.
  • The _amo_aflush and _amo_aflush_upc functions (atomic flush) force *ptr to be written to memory.

For more information, see the amo(3i) man page.

Intrinsic Bit Operations

These instrinsic functions copy, count, or shift bits, or compute the parity bit.
_dshiftl
Move the left most n bits of an integer into the right side of another integer, and return that integer.
_dshiftr
Move the right most n bits of an integer into the left side of another integer and return that integer.
_pbit
Copies the rightmost bit of a word to the nth bit, from the right, of another word.
_pbits
Copies the rightmost m bits of a word to another word beginning at bit n.
_poppar
Computes the parity bit for a variable.
_popcnt
_popcnt32
_popcnt64
Counts the number of set bits in 32-bit and 64-bit integer words.
_leadz
_leadz32
_leadz64
Counts the number of leading 0 bits in 32-bit and 64-bit integer words.
_gbit
_gbit returns the value of the nth bit from the right.
_gbits
Returns a value consisting of m bits extracted from a variable, beginning at nth bit from the right.

Intrinsic Mask Operations

These instrinsic functions create bit masks.
_mask
Creates a left-justified or right-justified bit mask with all bits set to 1.
_mask1
Returns a left-justified bit mask with i bits set to 1.
_maskr
Returns a right-justified bit mask with i bits set to 1.

Miscellaneous Intrinsic Operations

These instrinsic functions perform various operations.
_int_mult_upper
Multiplies integers and returns the uppermost bits.
_ranf
Computes a pseudo-random floating-point number ranging from 0.0 through 1.0.
_rtc
Return a real-time clock value expressed in clock ticks.