<?xml version="1.0"?>
<!DOCTYPE flagsdescription SYSTEM
"http://www.spec.org/dtd/cpuflags2.dtd">

<flagsdescription>

   <filename>GIGA-BYTE-platform-settings-AmpereOneM-rev.1</filename>

   <title>SPEC CPU2026 Platform Settings for GIGA-BYTE Ampere AmpereOneM Systems</title>

   <os_tuning>
      <![CDATA[

      <p style="margin:1em 3em; width:400px; padding:2px; border:thin solid red;"><b>Note:</b>
      This page provides definitions for a variety of <b>possible</b> settings.
      Please see the SPEC CPU result page to find out what settings were <b>actually</b> used.  </p>

      <p>Many of the settings below are defined in more detail at
      <br /><a href="https://www.kernel.org/doc/Documentation/sysctl/vm.txt">
            https://www.kernel.org/doc/Documentation/sysctl/vm.txt</a>
      <br /><a href="https://www.kernel.org/doc/Documentation/sysctl/kernel.txt">
            https://www.kernel.org/doc/Documentation/sysctl/kernel.txt</a>
      <br /><a href="https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt">
            https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt</a>
      </p>

      <ul>

         <li><p><b>cpupower-frequency-set</b> Adjust the MHz for the CPUs on the system,
            set limits for them, or select a "scaling governor".  For example,
         <br /><samp>cpupower-frequency-set -g performance</samp> &nbsp;&nbsp;&nbsp;selects higher frequency
            at the cost of additional power usage;
         <br /><samp>cpupower-frequency-set -g powersave</samp> &nbsp;&nbsp;&nbsp;does the opposite.</p></li>

         <li><p><b>dirty_ratio</b>: Sets the threshold at which processes will begin writing dirty (modified)
            pages to disk.  The dirty_ratio is expressed as a percentage of total available memory.
         <br />For example, this command sets the threshold to 8%
         <br /><samp>echo 8 &gt; /proc/sys/vm/dirty_ratio</samp> </p></li>

         <li><p style="margin-bottom:.1em;"><b>drop_caches</b>: Reduces the
            size of the page cache and kernel slab objects</p>
         <ul style="margin-top:.1em;">
            <li>1 - frees the page cache</li>
            <li>2 - frees slab objects</li>
            <li>3 - frees both</li>
         </ul>
         <p>Example to free both:
         <br /><samp>echo 3 &gt; /proc/sys/vm/drop_caches</samp></p>
         </li>

         <li><p><b>numa_balancing</b>: Automatically move memory to nodes that are accessing it.
            This is done by un-mapping and re-mapping pages, which may incur unwanted overhead if proceses
            are already bound to the desired memory nodes.
         <br />For example, to disable numa balancing, one could use:
         <br /><samp>echo 0 &gt; /proc/sys/kernel/numa_balancing</samp> </p>
         </li>

         <li><p><b>numactl</b> Controls NUMA policy for individual processes.  There are many options, as defined at
         <a href="https://man7.org/linux/man-pages/man8/numactl.8.html">
               https://man7.org/linux/man-pages/man8/numactl.8.html</a>.
                  Options useful for workloads similar to SPEC CPU may include:</p>
               <ul>
                  <li><b>--interleave </b> - causes memory to be allocated across all NUMA nodes</li>
                  <li><b>--localalloc </b> - attempt to allocate memory on the same node as the process; if it fails,
                     allocate elsewhere</li>
                  <li><b>--membind=node</b> - force allocation from specified node</li>
                  <li><b>--physcpubind=cpu</b> - run process on specified CPU</li>
               </ul>
               <p style="margin-bottom:.1em;">Note that the SPEC CPU config file may use config file preprocessing and/or
                  shell mathematics to compute the desired memory location or desired CPU number.
               <br />For example, these commands pick a memory unit by dividing the copy number by the number of
                  CPUs per node:</p>
               <pre style="margin-top:.1em;margin-left:3em;">%define numasize 20
                  numactl --membind=`expr $SPECCOPYNUM / %{numasize}` --physcpubind=$SPECCOPYNUM </pre>
         </li>

         <li><p><b>swappiness</b> Controls how aggressively the kernel swaps memory pages.
            The values can range from 0 to 100.  Low values decrease the amount of swapping.
         <br />For example, this command indicates that swapping should occur only when essential:
         <br /><samp>echo 1 &gt; /proc/sys/vm/swappiness</samp> </p>
         </li>

         <li><p><b>transparent_hugepage</b> Transparent huge pages may provide a performance benefit by
            reducing kernel time spent looking up page locations.  The potential benefit is application dependent, and
            some applications may do better with smaller pages.  </p>
         <ul>
            <li>To enable: <samp>echo always &gt; /sys/kernel/mm/transparent_hugepage/enabled</samp></li>
            <li>To disable: <samp>echo never &gt; /sys/kernel/mm/transparent_hugepage/enabled</samp></li>
            <li>More info (and more options):
         <a href="https://www.kernel.org/doc/Documentation/vm/transhuge.txt">
               https://www.kernel.org/doc/Documentation/vm/transhuge.txt</a></li>
         </ul>
         </li>

         <li><p style="margin-bottom:.1em;"><b>tuned-adm</b> Controls <samp>tuned</samp>, the dynamic adaptive system
            tuning daemon.  Commonly, one may load a tuning profile, for example:</p>
         <ul style="margin-top:.1em;">
            <li><samp>tuned-adm profile powersave</samp> &nbsp;&nbsp;&nbsp;- conserve power at the possible
               expense of performance</li>
            <li><samp>tuned-adm profile throughput-performance</samp> &nbsp;&nbsp;&nbsp;- select a broadly
               applicable profile</li>
            <li><samp>tuned-adm profile latency-performance</samp> &nbsp;&nbsp;&nbsp;- optimize for deterministic
               performance at the cost of increased power consumption.</li>
         </ul>
         <p>It is also possible to disable all profiles, using:
         <br /><samp>&nbsp;tuned-adm off</samp></p>
         </li>

         <li><p><b>ulimit -s [n | unlimited]</b>: Allow the stack size to grow to <samp>n</samp> kbytes, or
         <samp>unlimited</samp> to impose no limit.</p> </li>

         <li><p style="margin-bottom:.1em;"><b>zone_reclaim_mode</b> Provides control over memory
            allocation when multiple NUMA nodes are active.  If zone reclaim is off, data files may be cached
            on any node.  There are three settings that can be ORed together:</p>
               <ul style="margin-top:.1em;">
                  <li>1 - Easily reusable pages will be reclaimed before allocating off node pages.   </li>
                  <li>2 - Write dirty pages if needed in order to generate enough space on the current NODE</li>
                  <li>4 - Swap pages if needed.</li>
               </ul>
               <p>For example, this command enables reclaiming:
               <br /><samp>echo 1 &gt; /proc/sys/vm/swappiness</samp></p>
         </li>
      </ul>

   ]]>
   </os_tuning>


   <firmware>
      <![CDATA[
      <dl>

         <dt><b>ANC mode:</b></dt>
         <dd>Ampere NUMA Control (ANC) specifies the number of desired NUMA (Non-Uniform Memory Access)
            nodes per chip:
            <ul>
               <li>monolithic: Each physical processor chip is a NUMA node (default)</li>
               <li>hemisphere: Each physical processor chip is two NUMA nodes</li>
               <li>quadrant: Each physical processor chip is four NUMA nodes</li>
            </ul>
            <p>
               Dividing the chip into separate nodes (hemisphere or quadrant) may improve latency to
               the last level cache and main memory, which may benefit overall performance for NUMA-aware
               operating systems and workloads.
            </p>

         </dd>

      </dl>

         ]]>
   </firmware>

   <parts>
      <![CDATA[

      <h3>jemalloc </h3>
         <div style="margin-left:2em;">
            <p>The jemalloc memory allocation library can speed up memory allocation, in
               part by keeping lists of commonly used sizes.
               The library includes various configuration options, which are documented at
            <a href="http://jemalloc.net/jemalloc.3.html">http://jemalloc.net/jemalloc.3.html</a> and
               in its file
            <samp>INSTALL.md</samp> as found in the distribution tar file, and as posted at
            <a href="https://github.com/jemalloc/jemalloc/blob/master/INSTALL.md">
                  https://github.com/jemalloc/jemalloc/blob/master/INSTALL.md</a>
         </p>
         <p style="margin-bottom:.1em;">Some of the useful options include:</p>
         <ul style="margin-top:.1em;">
            <li><b>--enable-prof</b> - enable debug / profiling features</li>
            <li><b>--prefix</b> - destination directory</li>
         <li><b>--with-lg-quantum=size</b>  - base 2 log of minimum allocation.  For example, setting this
            to 3 implies a minimum allocation of 8 bytes, which will cause jemalloc to provide
            additional size classes that are not 16-byte-aligned (24, 40, and 56).</li>
		 <li><b>--with-lg-page=size</b>  - Specify the base 2 log of the system page size.
			This option is only useful when cross compiling, since the configure script
			automatically determines the host's page size by default. Affects memory allocation
			efficiency and fragmentation.</li>
         </ul>

         <p style="margin-bottom:.1em;">Example configuration:</p>
         <pre style="margin-top:.1em; margin-left:3em;">$ wget https://github.com/jemalloc/jemalloc/releases/download/5.3.0/jemalloc-5.3.0.tar.bz2
$ bzip2 -dc jemalloc-5.3.0.tar.bz2 | tar -xf -
$ cd jemalloc-5.3.0/
$ ./configure --prefix=/usr/local/jemalloc-530
$ make -j30
$ sudo make install </pre>


         </div>

      ]]>
   </parts>


</flagsdescription>
