Using Arrays in YAML

In YAML, an array (or list) is an ordered collection of elements. These elements can be of various types, such as strings, numbers, objects, or even other arrays. Arrays are typically used when you need to group related data.

💡
Arrays in YAML are often referred to as “sequences” and involve a dash (-) to indicate a list item.

In this article, we will explore different ways to represent arrays in YAML, discuss their syntax, and provide real-world examples for better understanding.

Various Syntaxes for Arrays in YAML

Here are the main ways to define arrays in YAML.

1. Dash-Style (Block) Arrays

The most common way to represent an array in YAML is by using a list of items prefixed with a dash. Each dash represents one element in the sequence. This style is easy to read and write, and is commonly used in configuration files such as those for Ansible, Kubernetes, or CI/CD pipelines.

Example:

# A list of servers in a DevOps configuration file
servers:
  - web01.example.com
  - web02.example.com
  - db01.example.com

In the above example, servers is a sequence containing list of three servers: web01.example.com, web02.example.com and db01.example.com.

2. Inline Arrays (Flow Style)

YAML also supports "flow style," which is similar to JSON arrays and uses square brackets. This can be useful if you prefer more compact representations or if you have shorter lists that you want to keep on a single line.

Example:

# Web server configurations inline
web_servers: [web01.example.com, web02.example.com, web03.example.com]

While this style saves space, it may reduce readability for longer lists. It’s a common practice in shorter configuration values or when embedding arrays inline within a larger, more complex data structure.

3. Mixed Styles and Nested Arrays

YAML allows for arrays within arrays or arrays within dictionaries (mappings). These can be represented in block style or flow style. It’s perfectly valid to mix styles as long as the nesting is clear and consistent.

Example:

environments:
  - name: production
    servers:
      - prod-web01
      - prod-web02
      - prod-db01
  - name: staging
    servers:
      - stage-web01
      - stage-db01

Here, we have an array (environments) containing dictionaries ({name: value, servers: [...]}), and within those dictionaries, we have another array (servers). Each dictionary item is structured and indented to reflect the hierarchy.

Flow Style Nested Example:

environments: [
  { name: production, servers: [prod-web01, prod-web02, prod-db01] },
  { name: staging, servers: [stage-web01, stage-db01] }
]

While this is more compact, it might be less readable for larger configurations.

Multi-line Arrays in YAML

For large arrays or arrays of complex items, placing all items on a single line or using simple dash notation might become unwieldy. YAML’s indentation-based formatting lends itself well to multiline arrays, making them more readable and maintainable, when dealing with long strings, code snippets, or configuration blocks.

Multi-line Arrays with Dash Notation

Even basic dash-style arrays can be thought of as multiline arrays. For example:

tasks:
  - name: Install Apache
    command: apt-get install apache2 -y

  - name: Start Apache service
    command: systemctl start apache2

  - name: Ensure Apache is enabled
    command: systemctl enable apache2

This is a classic example often found in Ansible playbooks. Each item in the tasks array is a dictionary (mapping) that spans multiple lines. Each line is clearly associated with the list item above it due to indentation.

Using Block Scalars within Arrays

YAML provides block scalars (| for literal style and > for folded style) that can be used to include multi-line text as a single value. This is useful when your array items are large strings such as logs, configurations, or documentation excerpts.

Example:

config_files:
  - |
    # This is a configuration file
    # stored as a multi-line string in YAML.
    server {
      listen 80;
      server_name example.com;
    }
  - |
    # Another configuration file
    server {
      listen 443 ssl;
      server_name example.com;
      ssl_certificate /path/to/cert.pem;
      ssl_certificate_key /path/to/key.pem;
    }

In the above example, each array element under config_files is a multi-line string (block scalar). The | symbol indicates that line breaks within the scalar are significant and will be preserved.

Large Lists and Multiline Formatting

When dealing with very large lists (for example, a large list of users, IP addresses, or inventory items), readability can suffer. By spreading them across multiple lines and ensuring proper indentation, you can maintain clarity:

Example:

ip_addresses:
  - 192.168.1.101
  - 192.168.1.102
  - 192.168.1.103
  - 192.168.1.104
  - 192.168.1.105

While this is still fairly simple, imagine if each IP address item needed additional keys or metadata. The multiline approach would still make the structure easy to follow.

Real-World Use Cases

Here are some of the real-world use cases:

Infrastructure as Code (IaC) Scenarios

Infrastructure provisioning tools like Ansible, Chef, and Puppet often rely heavily on YAML for configuration. Arrays come into play when listing hosts, tasks, variables, and roles:

hosts:
  - name: database_servers
    nodes:
      - db01.example.com
      - db02.example.com
      - db03.example.com

Kubernetes Manifests

Kubernetes manifests are written in YAML, and arrays are a key component when specifying multiple containers in a single pod, multiple environment variables, or multiple ports:

apiVersion: v1
kind: Pod
metadata:
  name: sample-pod
spec:
  containers:
    - name: web-container
      image: nginx:latest
      ports:
        - containerPort: 80
    - name: sidecar-container
      image: alpine:latest
      command: ["sh", "-c", "echo Hello"]

Here, containers is an array, and each container can have its own arrays for ports, environment variables, and other details.

CI/CD Pipeline Configuration

Many CI/CD platforms (e.g., GitHub Actions, GitLab CI, CircleCI) use YAML to define workflows. Arrays often represent steps in the pipeline:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Install dependencies
        run: npm install
      - name: Run tests
        run: npm test

The steps key is an array, each step being a dictionary with its own keys (name, run, etc.).

Conclusion

Arrays are an important part of YAML’s flexible, human-readable structure. Whether you’re working on complex infrastructure-as-code scenarios, large Kubernetes configurations, or simple lists of items, YAML’s array syntax offers multiple ways to represent sequences.

✍️
Author: Hitesh Jethwa has more than 15+ years of experience with Linux system administration and DevOps. He likes to explain complicated topics in easy to understand way.