Using Arrays in YAML
In YAML, an array (or list) is an ordered collection of elements. These elements can be of various types, such as strings, numbers, objects, or even other arrays. Arrays are typically used when you need to group related data.
-
) to indicate a list item.In this article, we will explore different ways to represent arrays in YAML, discuss their syntax, and provide real-world examples for better understanding.
Various Syntaxes for Arrays in YAML
Here are the main ways to define arrays in YAML.
1. Dash-Style (Block) Arrays
The most common way to represent an array in YAML is by using a list of items prefixed with a dash. Each dash represents one element in the sequence. This style is easy to read and write, and is commonly used in configuration files such as those for Ansible, Kubernetes, or CI/CD pipelines.
Example:
# A list of servers in a DevOps configuration file
servers:
- web01.example.com
- web02.example.com
- db01.example.com
In the above example, servers
is a sequence containing list of three servers: web01.example.com
, web02.example.com
and db01.example.com
.
2. Inline Arrays (Flow Style)
YAML also supports "flow style," which is similar to JSON arrays and uses square brackets. This can be useful if you prefer more compact representations or if you have shorter lists that you want to keep on a single line.
Example:
# Web server configurations inline
web_servers: [web01.example.com, web02.example.com, web03.example.com]
While this style saves space, it may reduce readability for longer lists. It’s a common practice in shorter configuration values or when embedding arrays inline within a larger, more complex data structure.
3. Mixed Styles and Nested Arrays
YAML allows for arrays within arrays or arrays within dictionaries (mappings). These can be represented in block style or flow style. It’s perfectly valid to mix styles as long as the nesting is clear and consistent.
Example:
environments:
- name: production
servers:
- prod-web01
- prod-web02
- prod-db01
- name: staging
servers:
- stage-web01
- stage-db01
Here, we have an array (environments
) containing dictionaries ({name: value, servers: [...]}
), and within those dictionaries, we have another array (servers
). Each dictionary item is structured and indented to reflect the hierarchy.
Flow Style Nested Example:
environments: [
{ name: production, servers: [prod-web01, prod-web02, prod-db01] },
{ name: staging, servers: [stage-web01, stage-db01] }
]
While this is more compact, it might be less readable for larger configurations.
Multi-line Arrays in YAML
For large arrays or arrays of complex items, placing all items on a single line or using simple dash notation might become unwieldy. YAML’s indentation-based formatting lends itself well to multiline arrays, making them more readable and maintainable, when dealing with long strings, code snippets, or configuration blocks.
Multi-line Arrays with Dash Notation
Even basic dash-style arrays can be thought of as multiline arrays. For example:
tasks:
- name: Install Apache
command: apt-get install apache2 -y
- name: Start Apache service
command: systemctl start apache2
- name: Ensure Apache is enabled
command: systemctl enable apache2
This is a classic example often found in Ansible playbooks. Each item in the tasks
array is a dictionary (mapping) that spans multiple lines. Each line is clearly associated with the list item above it due to indentation.
Using Block Scalars within Arrays
YAML provides block scalars (|
for literal style and >
for folded style) that can be used to include multi-line text as a single value. This is useful when your array items are large strings such as logs, configurations, or documentation excerpts.
Example:
config_files:
- |
# This is a configuration file
# stored as a multi-line string in YAML.
server {
listen 80;
server_name example.com;
}
- |
# Another configuration file
server {
listen 443 ssl;
server_name example.com;
ssl_certificate /path/to/cert.pem;
ssl_certificate_key /path/to/key.pem;
}
In the above example, each array element under config_files
is a multi-line string (block scalar). The |
symbol indicates that line breaks within the scalar are significant and will be preserved.
Large Lists and Multiline Formatting
When dealing with very large lists (for example, a large list of users, IP addresses, or inventory items), readability can suffer. By spreading them across multiple lines and ensuring proper indentation, you can maintain clarity:
Example:
ip_addresses:
- 192.168.1.101
- 192.168.1.102
- 192.168.1.103
- 192.168.1.104
- 192.168.1.105
While this is still fairly simple, imagine if each IP address item needed additional keys or metadata. The multiline approach would still make the structure easy to follow.
Real-World Use Cases
Here are some of the real-world use cases:
Infrastructure as Code (IaC) Scenarios
Infrastructure provisioning tools like Ansible, Chef, and Puppet often rely heavily on YAML for configuration. Arrays come into play when listing hosts, tasks, variables, and roles:
hosts:
- name: database_servers
nodes:
- db01.example.com
- db02.example.com
- db03.example.com
Kubernetes Manifests
Kubernetes manifests are written in YAML, and arrays are a key component when specifying multiple containers in a single pod, multiple environment variables, or multiple ports:
apiVersion: v1
kind: Pod
metadata:
name: sample-pod
spec:
containers:
- name: web-container
image: nginx:latest
ports:
- containerPort: 80
- name: sidecar-container
image: alpine:latest
command: ["sh", "-c", "echo Hello"]
Here, containers
is an array, and each container can have its own arrays for ports
, environment variables, and other details.
CI/CD Pipeline Configuration
Many CI/CD platforms (e.g., GitHub Actions, GitLab CI, CircleCI) use YAML to define workflows. Arrays often represent steps in the pipeline:
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Install dependencies
run: npm install
- name: Run tests
run: npm test
The steps
key is an array, each step being a dictionary with its own keys (name
, run
, etc.).
Conclusion
Arrays are an important part of YAML’s flexible, human-readable structure. Whether you’re working on complex infrastructure-as-code scenarios, large Kubernetes configurations, or simple lists of items, YAML’s array syntax offers multiple ways to represent sequences.