Platform Settings for Dell PowerEdge Servers

Dell-Platform-Flags-PowerEdge-AMD-EPYC-v1.8 Platform Settings for Dell PowerEdge Servers

kernel.randomize_va_space (ASLR)

This setting can be used to select the type of process address space randomization. Defaults differ based on whether the architecture supports ASLR, whether the kernel was built with the CONFIG_COMPAT_BRK option or not, or the kernel boot options used.
Possible settings:

0: Turn process address space randomization off.
1: Randomize addresses of mmap base, stack, and VDSO pages.
2: Additionally randomize the heap. (This is probably the default.)

Disabling ASLR can make process execution more deterministic and runtimes more consistent. For more information see the randomize_va_space entry in the Linux sysctl documentation.

Transparent Hugepages (THP)

THP is an abstraction layer that automates most aspects of creating, managing, and using huge pages. It is designed to hide much of the complexity in using huge pages from system administrators and developers. Huge pages increase the memory page size from 4 kilobytes to 2 megabytes. This provides significant performance advantages on systems with highly contended resources and large memory workloads. If memory utilization is too high or memory is badly fragmented which prevents hugepages being allocated, the kernel will assign smaller 4k pages instead. Most recent Linux OS releases have THP enabled by default.
THP usage is controlled by the sysfs setting /sys/kernel/mm/transparent_hugepage/enabled. Possible values:

never: entirely disable THP usage.
madvise: enable THP usage only inside regions marked MADV_HUGEPAGE using madvise(3).
always: enable THP usage system-wide. This is the default.

THP creation is controlled by the sysfs setting /sys/kernel/mm/transparent_hugepage/defrag. Possible values:

never: if no THP are available to satisfy a request, do not attempt to make any.
defer: an allocation requesting THP when none are available get normal pages while requesting THP creation in the background.
defer+madvise: acts like "always", but only for allocations in regions marked MADV_HUGEPAGE using madvise(3); for all other regions it's like "defer".
madvise: acts like "always", but only for allocations in regions marked MADV_HUGEPAGE using madvise(3). This is the default.
always: an allocation requesting THP when none are available will stall until some are made.

An application that "always" requests THP often can benefit from waiting for an allocation until those huge pages can be assembled.
For more information see the Linux transparent hugepage documentation.

drop_caches

sysctl is used to change kernel parameters at run-time.
-w vm.drop_caches=3 - clears filesystem caches

tuned-adm

This command line utility allows you to switch between user definable tuning profiles. Several predefined profiles are already included. You can even create your own profile, either based on one of the existing ones by copying it or make a completely new one.i The distribution provided profiles are stored in subdirectories below /usr/lib/tuned and the user defined profiles in subdirectories below /etc/tuned. If there are profiles with the same name in both places, user defined profiles have precedence.

Profiles Used:

throughput-performance: Broadly applicable tuning that provides excellent performance across a variety of common server workloads.
latency-performance: Optimize for deterministic performance at the cost of increased power consumption.

]]>

DRAM Refresh Delay

Default: Minimum

Minimum: By minimizing the delay time, it is ensured that the memory controller runs the REFRESH command at regular intervals.
Performance: By enabling the CPU memory controller to delay running the REFRESH command, performance can be improved for some workloads.

Memory Interleaving

Default: Auto

Memory interleaving is supported if a symmetric memory configuration is installed. When set to Disabled, the system supports Non-Uniform Memory Access (NUMA) (asymmetric) memory configurations. Operating Systems that are NUMA-aware understand the distribution of memory in a particular system and can intelligently allocate memory in an optimal manner. Operating Systems that are not NUMA-aware could allocate memory to a processor that is not local, resulting in a loss of performance. Die and Socket interleaving should only be enabled for Operarting Systems that are not NUMA-aware.

DIMM Self Healing on Uncorrectable Memory Error

Default: Enabled

Post Package Repair (PPR) on Uncorrectable Memory Error
Disabling this feature may improve memory performance for some workloads.

Correctable Memory ECC SMI

Default: Enabled

Allows the system to log ECC corrected DRAM errors into the SEL log. Logging these rare errors can help identify marginal components; however the system will pause for a few milliseconds after an error while the log entry is created. Latency conscious customers may wish to disable this feature. Spare Mode, and Mirror mode require this feature to be enabled.

Logical Processor

Default: Enabled

Each processor core supports up to two logical processors. When set to Enabled, the BIOS reports all logical processors. When set to Disabled, the BIOS only reports one logical processor per core. Generally, higher processor count results in increased performance for most multi-threaded workloads and the recommendation is to keep this enabled. However, there are some floating point/scientific workloads, including HPC workloads, where disabling this feature may result in higher performance.

Virtualization Technology

Default: Enabled

When set to Enabled, the BIOS will enable processor Virtualization features and provide the virtualization support to the Operating System (OS) through the DMAR table. In general, only virtualized environments such as VMware(r) ESX (tm), Microsoft Hyper-V(r) , Red Hat(r) KVM, and other virtualized operating systems will take advantage of these features. Disabling this feature is not known to significantly alter the performance or power characteristics of the system, so leaving this option Enabled is advised for most cases.

L1 Stream HW Prefetcher

Default: Enabled

When set to Enabled, the processor provides advanced performance tuning by controlling the L1 stream HW prefetcher setting. Use the recommended setting, and this option will allow for optimizing overall workloads.

L2 Stream HW Prefetcher

Default: Enabled

When set to Enabled, the processor provides advanced performance tuning by controlling the L2 stream HW prefetcher setting. Use the recommended setting, and this option will allow for optimizing overall workloads.

L1 Stride Prefetcher

Default: Enabled

When set to Enabled, the processor provides additional fetch to the data access for an individual instruction for performance tuning by controlling the L1 stride prefetcher setting. Use the recommended setting, and this option will allow for optimizing overall workloads.

L1 Region Prefetcher

Default: Enabled

When set to Enabled, the processor provides additional fetch to data along with the data access to the given instruction for performance tuning by controlling the L1 region prefetcher setting. Use the recommended setting, and this option will allow for optimizing overall workloads.

L2 Up Down Prefetcher

Default: Enabled

When set to Enabled, the processor uses memory access to determine whether to fetch next or previous for all memory access for advanced performance tuning by controlling the L2 up/down prefetcher setting. Use the recommended setting, and this option will allow for optimizing overall workloads.

NUMA Nodes per Socket

Default: 1

Allows configuration of the memory NUMA domains per socket. The configuration can consist of one whole doman (NPS1), two domains (NPS2) or four domains (NPS4).
In the case of two-socket platforms, an additional NPS profile is available to have whole system memory be mapped as a single NUMA domain (NPS0).

L3 Cache as NUMA Domain

Default: Disabled

This field specifies that each CCX within the processor will be declared as a NUMA domain.

ACPI CST C2 Latency

Default: 800

Enter in 18-1000 microseconds (decimal value). Larger C2 latency values will reduce teh number of C2 transitions and reduce C2 residency. Fewer transitions can help when performance is sensitive to the latency of C2 entry and exit. Higher residency can improve performance by allowing higher frequency boost and reduce idle core power. With Linux kernel 6.0 or later, the C2 transition cost is significantly reduced. The best value will be dependent on kernel version, use case, and workload.

System Profile

Default: Performance Per Watt (OS)

When set to a mode other than Custom, BIOS will set each option accordingly. When set to Custom, each option setting can be changed.

CPU Power Management

Default: OS DBPM

Allows selection of CPU power management methodology.

Maximum Performance: typically selected for performance-centric workloads where it is acceptable to consume additional power to achieve the highest possible performance for the computing environment. This mode drives processor frequency to the maximum across all cores (although idled cores can still be frequency reduced by C-state enforcement through BIOS or OS mechanisms if enabled). This mode also offers the lowest latency of the CPU Power Management Mode options, so is always preferred for latency-sensitive environments.
OS DBPM: another performance-per-watt option that relies on the operating system to dynamically control individual frequency. Both Windows and Linux can take advantage of this mode to reduce frequency of idle or underutilized cores in order to save power.

C-States

Default: Enabled

C-States allow the processor to enter lower power states when idle.
When set to Enabled (OS Controlled) or when set to Autonomous (if Hardware control is supported), the processor can operate in all available Power States to save power, but may increase memory latency and frequency jitter.

Memory Patrol Scrub

Default: Standard

Patrol Scrubbing searches the memory for errors and repairs correctable errors to prevent the accumulation of memory errors.

Disabled: no patrol scrubbing will occur.
Standard: the entire memory array will be scrubbed once in a 24 hour period.
Extended: the entire memory array will be scrubbed more frequently to further increase system reliability.

PCI ASPM L1 Link Power Management

Default: Enabled

When enabled, PCIe Advanced State Power Management (ASPM) can reduce overall system power a bit while slightly reducing system performance. NOTE: Some devices may not perform properly (they may hang or cause the system to hang) when ASPM is enabled. For this reason L1 will only be enabled for validated qualified cards.

Periodic Directory Rinse (PDR) Tuning

Default: Auto

Controls PDR settings that may impact the workload and processor performance

Auto: Same as Blended
Periodic (RefClock Based Floss Only): Rate based Directory Rinse.
Blended (Cache Load Based Floss with Background RefClock Based Floss): Demand based Directory Rinse.

Determinism Control

Default: Auto

Set to Manual to enable Determinism Slider Control. Read-only unless System Profile is set to Custom.

Auto: Use default performance determinism settings.
Manual: Specify custom power/performance determinism.

Determinism Slider

Default: Performance Determinism

Controls whether BIOS will enable determinism to control performance. Read-only unless System Profile is set to Custom and Determinsim Control is set to Manual.

Performance: Workload performance is the same regardless of variations in the environment and silicon.
Power: Maximizes workload performance to part-specific power limits, thereby tapping the additional performance headroom based on the silicon. Maximum performance can be obtained by setting the TDP and Package Power Limit (PPL) to the maximum TDP value supported by the CPU.

Optimizer Mode

Default: Disabled

Allows for automatic tunning maximizing the processor's performance based on system configuration and thermal environment. Requires the system to be configured in Power Determinism Mode.

Enabled: Enables the feature.
Disabled: Turns off the feature.

DF CState

Default: Enabled

This field enables/disables DF CState.

Enabled: Enables DF CState.
Disabled: Disables DF Cstate.

From AMD Tuning Guide:

Disabled: Prevents the AMD Infinity Fabric from entering a low-power state.
Enabled: Allows the AMD Infinity Fabric to enter a low-power state.

(www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/tuning-guides/58467_amd-epyc-9005-tg-bios-and-workload.pdf#page=11)

CPU Interconnect Bus Link Power Management

Default: Enabled

When Enabled, CPU interconnect bus link power management can reduce overall system power a bit while slightly reducing system performance.

Algorithm Performance Boost Disable (ApbDis)

Default: Disabled

Enabled: a specific hard-fused Data Fabric (SoC) P-state is forced for optimizing workloads sensitive to latency or throughput. (For higher performance)
Disabled: P-states will be automatically managed by the Application Power Management, allowing the processor to provide maximum performance while remaining within a specified power-delivery and thermal envelope. (For power savings)

CPPC

Default: Auto

Auto: Same as Enabled
Enabled: Allows the OS to make performance/power optimization requests using ACPI (Advanced Configuration and Power Interface) CPPC (Collaborative Processor Performance Control).
Disabled: Prevents the OS from making performance/power optimization requests using ACPI CPPC.

Adaptive Allocation (AA)

Default: Auto

Auto: Same as Disabled
Enabled: Dynamically alters cache replacement and allocation policy based on application behaviors.
Disabled: Uses a fixed L2 replacement/allocation policy, which may benefit highly-optimized, cache-aware codes.

Fan Speed Offset

Default: Off

Configuring this option allows additional cooling to the server. In case hardware is added (example, new PCIe cards), it may require additional cooling. A fan speed offset causes fan speeds to increase (by the offset % value) over baseline fan speeds calculated by the Thermal Control algorithm.

]]>