SPEC CPU2026 Platform Settings for GIGA-BYTE Ampere AmpereOneM Systems

GIGA-BYTE-platform-settings-AmpereOneM-rev.1 SPEC CPU2026 Platform Settings for GIGA-BYTE Ampere AmpereOneM Systems Note: This page provides definitions for a variety of possible settings. Please see the SPEC CPU result page to find out what settings were actually used.

Many of the settings below are defined in more detail at
https://www.kernel.org/doc/Documentation/sysctl/vm.txt
https://www.kernel.org/doc/Documentation/sysctl/kernel.txt
https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt

cpupower-frequency-set Adjust the MHz for the CPUs on the system, set limits for them, or select a "scaling governor". For example,
cpupower-frequency-set -g performance selects higher frequency at the cost of additional power usage;
cpupower-frequency-set -g powersave does the opposite.
dirty_ratio: Sets the threshold at which processes will begin writing dirty (modified) pages to disk. The dirty_ratio is expressed as a percentage of total available memory.
For example, this command sets the threshold to 8%
echo 8 > /proc/sys/vm/dirty_ratio
drop_caches: Reduces the size of the page cache and kernel slab objects
- 1 - frees the page cache
- 2 - frees slab objects
- 3 - frees both
Example to free both:
echo 3 > /proc/sys/vm/drop_caches
numa_balancing: Automatically move memory to nodes that are accessing it. This is done by un-mapping and re-mapping pages, which may incur unwanted overhead if proceses are already bound to the desired memory nodes.
For example, to disable numa balancing, one could use:
echo 0 > /proc/sys/kernel/numa_balancing
numactl Controls NUMA policy for individual processes. There are many options, as defined at https://man7.org/linux/man-pages/man8/numactl.8.html. Options useful for workloads similar to SPEC CPU may include:
- --interleave - causes memory to be allocated across all NUMA nodes
- --localalloc - attempt to allocate memory on the same node as the process; if it fails, allocate elsewhere
- --membind=node - force allocation from specified node
- --physcpubind=cpu - run process on specified CPU
Note that the SPEC CPU config file may use config file preprocessing and/or shell mathematics to compute the desired memory location or desired CPU number.
For example, these commands pick a memory unit by dividing the copy number by the number of CPUs per node:
```
%define numasize 20
                  numactl --membind=`expr $SPECCOPYNUM / %{numasize}` --physcpubind=$SPECCOPYNUM 
```
swappiness Controls how aggressively the kernel swaps memory pages. The values can range from 0 to 100. Low values decrease the amount of swapping.
For example, this command indicates that swapping should occur only when essential:
echo 1 > /proc/sys/vm/swappiness
transparent_hugepage Transparent huge pages may provide a performance benefit by reducing kernel time spent looking up page locations. The potential benefit is application dependent, and some applications may do better with smaller pages.
- To enable: echo always > /sys/kernel/mm/transparent_hugepage/enabled
- To disable: echo never > /sys/kernel/mm/transparent_hugepage/enabled
- More info (and more options): https://www.kernel.org/doc/Documentation/vm/transhuge.txt
tuned-adm Controls tuned, the dynamic adaptive system tuning daemon. Commonly, one may load a tuning profile, for example:
- tuned-adm profile powersave - conserve power at the possible expense of performance
- tuned-adm profile throughput-performance - select a broadly applicable profile
- tuned-adm profile latency-performance - optimize for deterministic performance at the cost of increased power consumption.
It is also possible to disable all profiles, using:
tuned-adm off
ulimit -s [n | unlimited]: Allow the stack size to grow to n kbytes, or unlimited to impose no limit.
zone_reclaim_mode Provides control over memory allocation when multiple NUMA nodes are active. If zone reclaim is off, data files may be cached on any node. There are three settings that can be ORed together:
- 1 - Easily reusable pages will be reclaimed before allocating off node pages.
- 2 - Write dirty pages if needed in order to generate enough space on the current NODE
- 4 - Swap pages if needed.
For example, this command enables reclaiming:
echo 1 > /proc/sys/vm/swappiness

]]>

ANC mode:

Ampere NUMA Control (ANC) specifies the number of desired NUMA (Non-Uniform Memory Access) nodes per chip:

monolithic: Each physical processor chip is a NUMA node (default)
hemisphere: Each physical processor chip is two NUMA nodes
quadrant: Each physical processor chip is four NUMA nodes

Dividing the chip into separate nodes (hemisphere or quadrant) may improve latency to the last level cache and main memory, which may benefit overall performance for NUMA-aware operating systems and workloads.

]]> jemalloc

The jemalloc memory allocation library can speed up memory allocation, in part by keeping lists of commonly used sizes. The library includes various configuration options, which are documented at http://jemalloc.net/jemalloc.3.html and in its file INSTALL.md as found in the distribution tar file, and as posted at https://github.com/jemalloc/jemalloc/blob/master/INSTALL.md

Some of the useful options include:

--enable-prof - enable debug / profiling features
--prefix - destination directory
--with-lg-quantum=size - base 2 log of minimum allocation. For example, setting this to 3 implies a minimum allocation of 8 bytes, which will cause jemalloc to provide additional size classes that are not 16-byte-aligned (24, 40, and 56).
--with-lg-page=size - Specify the base 2 log of the system page size. This option is only useful when cross compiling, since the configure script automatically determines the host's page size by default. Affects memory allocation efficiency and fragmentation.

Example configuration:

$ wget https://github.com/jemalloc/jemalloc/releases/download/5.3.0/jemalloc-5.3.0.tar.bz2
$ bzip2 -dc jemalloc-5.3.0.tar.bz2 | tar -xf -
$ cd jemalloc-5.3.0/
$ ./configure --prefix=/usr/local/jemalloc-530
$ make -j30
$ sudo make install

]]>