Terraform

33 min readJun 1, 2017

A Comprehensive Guide to Terraform

A series of posts that will help understanding and teach you best practices for using Terraform in the real world

Today, we are kicking off a series of blog posts on how to define and manage infrastructure-as-code in the real world using Terraform. If you haven’t used it before, Terraform is an open-source tool that allows you to define the infrastructure for a variety of cloud providers (e.g. AWS, Azure, Google Cloud, DigitalOcean, etc) using a simple, declarative programming language and to deploy and manage that infrastructure using a few CLI commands.

Why infrastructure-as-code?

A long time ago, in a data center far, far away, an ancient group of powerful beings known as sysadmins used to deploy infrastructure manually. Every server, every route table entry, every database configuration, and every load balancer was created and managed by hand. It was a dark and fearful age: fear of downtime, fear of accidental misconfiguration, fear of slow and fragile deployments, and fear of what would happen if the sysadmins fell to the dark side (i.e. took a vacation). The good news is that thanks to the DevOps Rebel Alliance, we now have a better way to do things: Infrastructure-as-Code (IAC).

Instead of clicking around a web UI or SSHing to a server and manually executing commands, the idea behind IAC is to write code to define, provision, and manage your infrastructure. This has a number of benefits:

You can automate your entire provisioning and deployment process, which makes it much faster and more reliable than any manual process.
You can represent the state of your infrastructure in source files that anyone can read rather than in a sysadmin’s head.
You can store those source files in version control, which means the entire history of your infrastructure is now captured in the commit log, which you can use to debug problems, and if necessary, roll back to older versions.
You can validate each infrastructure change through code reviews and automated tests.
You can create a library of reusable, documented, battle-tested infrastructure code that makes it easier to scale and evolve your infrastructure.

There is one other very important, and often overlooked, reason why you should use IAC: it makes developers happy. Deploying code is a repetitive and tedious task. A computer can do that sort of thing quickly and reliably, but a human will be slow and error-prone. Moreover, a developer will resent that type of work, as it involves no creativity, no challenge, and no recognition. You could deploy code perfectly for months, and no one will take notice — until that one day where you mess it up.

That creates a stressful and unpleasant environment. IAC offers a better alternative that allows computers to do what they do best (automation) and developers to do what they do best (coding).

Why Terraform?

There are many ways to do IAC, from something as simple as a hand-crafted shell script all the way up to a managed service such as Puppet Enterprise. Why did we pick Terraform as our IAC tool of choice? Terraform is our tool of choice to manage the entire lifecycle of infrastructure using infrastructure as code. That means declaring infrastructure components in configuration files that are then used by Terraform to provision, adjust and tear down infrastructure in various cloud providers.

What are the most popular tools in the market?

If you search the Internet for “infrastructure-as-code”, it’s pretty easy to come up with a list of the most popular tools:

What’s not easy is figuring out which one of these you should use. All of these tools can be used to manage infrastructure as code. All of them are open-source, backed by large communities of contributors, and work with many different cloud providers (with the notable exception of CloudFormation, which is closed source and AWS-only). All of them offer enterprise support. All of them are well documented, both in terms of official documentation and community resources such as blog posts and StackOverflow questions. So how do you decide?

What makes this even harder is that most of the comparisons you find online between these tools do little more than list the general properties of each tool and make it sound like you could be equally successful with any of them. And while that’s technically true, it’s not helpful. It’s a bit like telling a programming newbie that you could be equally successful in building a website with PHP, C, or Assembly — a statement that’s technically true, but one that omits a huge amount of information that would be incredibly useful in making a good decision.

In this post, I will answer, why we should pick Terraform over the other IAC tools.

Difference between Configuration Management vs Provisioning?

Chef, Puppet, Ansible, and SaltStack are all configuration management tools, which means they are designed to install and manage software on existing servers. CloudFormation and Terraform are provisioning tools, which means they are designed to provision the servers themselves (as well as the rest of your infrastructure, like load balancers, databases, networking configuration, etc), leaving the job of configuring those servers to other tools.

Difference between Mutable Infrastructure vs Immutable Infrastructure?

Configuration management tools such as Chef, Puppet, Ansible, and SaltStack typically default to a mutable infrastructure paradigm. For example, if you tell Chef to install a new version of OpenSSL, it’ll run the software update on your existing servers and the changes will happen in place. Over time, as you apply more and more updates, each server builds up a unique history of changes. This often leads to a phenomenon known as configuration drift, where each server becomes slightly different than all the others, leading to subtle configuration bugs that are difficult to diagnose and nearly impossible to reproduce.

If you’re using a provisioning tool such as Terraform to deploy machine images created by Docker or Packer, then every “change” is actually the deployment of a new server (just like every “change” to a variable in functional programming actually returns a new variable). For example, to deploy a new version of OpenSSL, you would create a new image using Packer or Docker with the new version of OpenSSL already installed, deploy that image across a set of totally new servers, and then un-deploy the old servers. This approach reduces the likelihood of configuration drift bugs, makes it easier to know exactly what software is running on a server, and allows you to trivially deploy any previous version of the software at any time. Of course, it’s possible to force configuration management tools to do immutable deployments too, but it’s not the idiomatic approach for those tools, whereas it’s a natural way to use provisioning tools.

Difference between Master vs Masterless?

By default, Chef, Puppet, and SaltStack all require that you run a master server for storing the state of your infrastructure and distributing updates. Every time you want to update something in your infrastructure, you use a client (e.g., a command-line tool) to issue new commands to the master server, and the master server either pushes the updates out to all the other servers, or those servers pull the latest updates down from the master server on a regular basis.

A master server offers a few advantages. First, it’s a single, central place where you can see and manage the status of your infrastructure. Many configuration management tools even provide a web interface (e.g., the Chef Console, Puppet Enterprise Console) for the master server to make it easier to see what’s going on. Second, some master servers can run continuously in the background, and enforce your configuration. That way, if someone makes a manual change on a server, the master server can revert that change to prevent configuration drift.

However, having to run a master server has some serious drawbacks:

Extra infrastructure: You have to deploy an extra server, or even a cluster of extra servers (for high availability and scalability), just to run the master.
Maintenance: You have to maintain, upgrade, back up, monitor, and scale the master server(s).
Security: You have to provide a way for the client to communicate to the master server(s) and a way for the master server(s) to communicate with all the other servers, which typically means opening extra ports and configuring extra authentication systems, all of which increases your surface area to attackers.

Chef, Puppet, and SaltStack do have varying levels of support for masterless modes where you just run their agent software on each of your servers, typically on a periodic schedule (e.g., a cron job that runs every 5 minutes), and use that to pull down the latest updates from version control (rather than from a master server). This significantly reduces the number of moving parts, but, as discussed in the next section, this still leaves a number of unanswered
questions, especially about how to provision the servers and install the agent software on them in the first place.

Ansible, CloudFormation, Heat, and Terraform are all masterless by default. Or, to be more accurate, some of them may rely on a master server, but it’s already part of the infrastructure you’re using and not an extra piece you have to manage. For example, Terraform communicates with cloud providers using the cloud provider’s APIs, so in some sense, the API servers are master servers, except they don’t require any extra infrastructure or any extra authentication mechanisms (i.e., just use your API keys). Ansible works by connecting directly to each server over SSH, so again, you don’t have to run any extra infrastructure or manage extra authentication mechanisms (i.e., just use your SSH keys).

Difference between Agent vs Agentless?

Chef, Puppet, and SaltStack all require you to install agent software (e.g., Chef Client, Puppet Agent, Salt Minion) on each server you want to configure. The agent typically runs in the background on each server and is responsible for installing the latest configuration management updates.

This has a few drawbacks:

Bootstrapping: How do you provision your servers and install the agent software on them in the first place? Some configuration management tools kick the can down the road, assuming some external process will take care of this for them (e.g., you first use Terraform to deploy a bunch of servers with a VM image that has the agent already installed); other configuration management tools have a special bootstrapping process where you run one-off commands to provision the servers using the cloud provider APIs and install the agent software on those servers over SSH.
Maintenance: You have to carefully update the agent software on a periodic basis, being careful to keep it in sync with the master server if there is one. You also have to monitor the agent software and restart it if it crashes.
Security: If the agent software pulls down configuration from a master server (or some other server if you’re not using a master), then you have to open outbound ports on every server. If the master server pushes the configuration to the agent, then you have to open inbound ports on every server. In either case, you have to figure out how to authenticate the agent to the server it’s talking to. All of this increases your surface area to attackers.

Once again, Chef, Puppet, and SaltStack do have varying levels of support for agentless modes (e.g., salt-ssh), but these often feel like they were tacked on as an afterthought and don’t always support the full feature set of the configuration management tool. That’s why in the wild, the default or idiomatic configuration for Chef, Puppet, and SaltStack almost always includes an agent, and usually a master too.

All of these extra moving parts introduce a large number of new failure modes into your infrastructure. Each time you get a bug report at 3 a.m., you’ll have to figure out if it’s a bug in your application code, or your IAC code, or the configuration management client, or the master server(s), or the way the client talks to the master server(s), or the way other servers talk to the master server(s), or…

Ansible, CloudFormation, Heat, and Terraform do not require you to install any extra agents. Or, to be more accurate, some of them require agents, but these are typically already installed as part of the infrastructure you’re using. For example, AWS, Azure, Google Cloud, and all other cloud providers take care of installing, managing, and authenticating agent software on each of their physical servers. As a user of Terraform, you don’t have to worry about any of that: you just issue commands and the cloud provider’s agents execute
them for you on all of your servers. With Ansible, your servers need to run
the SSH Daemon, which is common to run on most servers anyway.

Difference between Large Community vs Small Community?

Whenever you pick a technology, you are also picking a community. In many cases, the ecosystem around the project can have a bigger impact on your experience than the inherent quality of the technology itself. The community determines how many people contribute to the project, how many plug-ins,
integrations, and extensions are available, how easy it is to find help online (e.g., blog posts, questions on StackOverflow), and how easy it is to hire someone to help you (e.g., an employee, consultant, or support company).

It’s hard to do an accurate comparison between communities, but you can spot some trends by searching online. The table below shows a comparison of popular IAC tools, with data I gathered during May 2019, including whether the IAC tool is open source or closed source, what cloud providers it supports, the total number of contributors and stars on GitHub, how many commits and active issues there were over a one-month period from mid-April to mid-May, how many open source libraries are available for the tool, the number of questions listed for that tool on StackOverflow, and the number of jobs that mention the tool on Indeed.com.

Difference Between Mature vs Cutting Edge?

Another key factor to consider when picking any technology is maturity. The table below shows the initial release dates and current version number (as of May 2019) for each of the IAC tools.

A comparison of IAC maturity as of May 2019. Click for the full-size image.

Again, this is not an apples-to-apples comparison, since different tools have different versioning schemes, but some trends are clear. Terraform is, by far, the youngest IAC tool in this comparison. It’s still pre 1.0.0, so there is no guarantee of a stable or backward-compatible API, and bugs are relatively common (although most of them are minor). This is Terraform’s biggest weakness: although it has gotten extremely popular in a short time, the price you pay for using this new, cutting-edge tool is that it is not as mature as some of the other IAC options.

Using Multiple Tools Together

Although I’ve been comparing IAC tools this entire blog post, the reality is that you will likely need to use multiple tools to build your infrastructure. Each of the tools you’ve seen has strengths and weaknesses, so it’s your job to pick the right tool for the right job.

Here are three common combinations I’ve seen work well at a number of companies:

Provisioning plus configuration management
Provisioning plus server templating
Provisioning plus server templating plus orchestration

Provisioning plus configuration management

Example: Terraform and Ansible. You use Terraform to deploy all the underlying infrastructure, including the network topology (i.e., VPCs, subnets, route tables), data stores (e.g., MySQL, Redis), load balancers, and servers. You then use Ansible to deploy your apps on top of those servers.

This is an easy approach to start with, as there is no extra infrastructure to run (Terraform and Ansible are both client-only applications) and there are many ways to get Ansible and Terraform to work together (e.g., Terraform adds special tags to your servers and Ansible uses those tags to find the server and configure them). The major downside is that using Ansible typically means you’re writing a lot of procedural code, with mutable servers, so as your codebase, infrastructure, and team grow, maintenance may become more difficult.

Provisioning plus server templating

Example: Terraform and Packer. You use Packer to package your apps as virtual machine images. You then use Terraform to deploy (a) servers with these virtual machine images and (b) the rest of your infrastructure, including the network topology (i.e., VPCs, subnets, route tables), data stores (e.g., MySQL, Redis), and load balancers.

This is also an easy approach to start with, as there is no extra infrastructure to run (Terraform and Packer are both client-only applications). Moreover, this is an immutable infrastructure approach, which will make maintenance easier. However, there are two major drawbacks. First, virtual machines can take a long time to build and deploy, which will slow down your iteration speed. Second, the deployment strategies you can implement with Terraform are limited (e.g., you can’t implement blue-green deployment natively in Terraform), so you either end up writing lots of complicated deployment scripts, or you turn to orchestration tools, as described next.

Provisioning plus server templating plus orchestration

Example: Terraform, Packer, Docker, and Kubernetes. You use Packer to create a virtual machine image that has Docker and Kubernetes installed. You then use Terraform to deploy (a) a cluster of servers, each of which runs this virtual machine image and (b) the rest of your infrastructure, including the network topology (i.e., VPCs, subnets, route tables), data stores (e.g., MySQL, Redis), and load balancers. Finally, when the cluster of servers boots up, it
forms a Kubernetes cluster that you use to run and manage your Dockerized applications.

The advantage of this approach is that Docker images build fairly quickly, you can run and test them on your local computer, and you can take advantage of all the built-in functionality of Kubernetes, including various deployment strategies, auto-healing, auto-scaling, and so on. The drawback is the added complexity, both in terms of extra infrastructure to run (Kubernetes clusters are difficult and expensive to deploy and operate, though most major cloud
providers now provide managed Kubernetes services, which can offload some of this work), and in terms of several extra layers of abstraction (Kubernetes, Docker, Packer) to learn, manage, and debug.

Conclusion

Putting it all together, the table below shows how the most popular IAC tools stack up. Note that this table shows the default or most common way the various IAC tools are used, though as discussed earlier, these IAC tools are flexible enough to be used in other configurations, too (e.g., Chef can be used without a master, Salt can be used to do immutable infrastructure).

Learn the basics of Terraform in this step-by-step tutorial of how to deploy a cluster of web servers and a load balancer on AWS?

This guide is targeted at AWS and Terraform newbies, so don’t worry if you haven’t used either one before. We’ll walk you through the entire process, step-by-step:

Set up your AWS account
Install Terraform
Deploy a single server
Deploy a single web server
Deploy a cluster of web servers
Deploy a load balancer
Clean up

Set up your AWS account

Terraform can provision infrastructure across many different types of cloud providers, including AWS, Azure, Google Cloud, DigitalOcean, and many others. For this tutorial, we picked Amazon Web Services (AWS) because:

It provides a huge range of reliable and scalable cloud hosting services, including Elastic Compute Cloud (EC2), Auto Scaling Groups (ASGs), and Elastic Load Balancing (ELB). If you find the AWS terminology confusing, be sure to check out AWS in Plain English.
AWS is the most popular cloud infrastructure provider, by far.
AWS offers a generous Free Tier which should allow you to run all of these examples for free.

When you first register for AWS, you initially sign in as the root user. This user account has access permissions to everything, so from a security perspective, we recommend only using it to create other user accounts with more limited permissions (see IAM Best Practices). To create a more limited user account, head over to the Identity and Access Management (IAM) console, click “Users”, and click the blue “Create New Users” button. Enter a name for the user and make sure “Generate an access key for each user” is checked:

Click the “Create” button and you’ll be able to see security credentials for that user, which consist of Access Key ID and a Secret Access Key. You MUST save these immediately, as they will never be shown again. We recommend storing them somewhere secure (e.g. a password manager such as Keychain or 1Password) so you can use them a little later in this tutorial.

Once you’ve saved the credentials, click “Close” (twice) and you’ll be taken to the list of users. Click on the user you just created and select the “Permissions” tab. By default, a new IAM user does not have permission to do anything in the AWS account. To be able to use Terraform for the examples in this tutorial, add the AmazonEC2FullAccess permission (learn more about Managed IAM Policies here):

Install Terraform

Follow the instructions here to install Terraform. When you’re done, you should be able to run the terraform command:

> terraform usage: terraform [--version] [--help] <command> [args](...)

In order for Terraform to be able to make changes in your AWS account, you will need to set the AWS credentials for the user you created earlier as environment variables:

export AWS_ACCESS_KEY_ID=(your access key id) export AWS_SECRET_ACCESS_KEY=(your secret access key)

Deploy a single server

Terraform code is written in a language called HCL in files with the extension “.tf”. It is a declarative language, so your goal is to describe the infrastructure you want, and Terraform will figure out how to create it. Terraform can create infrastructure across a wide variety of platforms, or what it calls providers, including AWS, Azure, Google Cloud, DigitalOcean, and many others. The first step to using Terraform is typically to configure the provider(s) you want to use. Create a file called “main.tf” and put the following code in it:

provider "aws" { region = "us-east-1" }

This tells Terraform that you are going to be using the AWS provider and that you wish to deploy your infrastructure in the “us-east-1” region (AWS has data centers all over the world, grouped into regions and availability zones, and us-east-1 is the name for data centers in Virginia, USA). You can configure other settings for the AWS provider, but for this example, since you’ve already configured your credentials as environment variables, you only need to specify the region.

For each provider, there are many different kinds of “resources” you can create, such as servers, databases, and load balancers. Before we deploy a whole cluster of servers, let’s first figure out how to deploy a single server that will run a simple “Hello, World” web server. In AWS lingo, a server is called an “EC2 Instance.” To deploy an EC2 Instance, add the following code to main.tf:

resource "aws_instance" "example" { ami = "ami-2d39803a" instance_type = "t2.micro" }

Each resource specifies a type (in this case, “aws_instance”), a name (in this case “example”) to use as an identifier within the Terraform code, and a set of configuration parameters specific to the resource. The aws_instance resource documentation lists all the parameters it supports. Initially, you only need to set the following ones:

ami: The Amazon Machine Image to run on the EC2 Instance. The example above sets this parameter to the ID of an Ubuntu 14.04 AMI in us-east-1.
instance_type: The type of EC2 Instance to run. Each EC2 Instance Typehas different amount of CPU, memory, disk space, and networking specs. The example above uses “t2.micro”, which has 1 virtual CPU, 1GB of memory, and is part of the AWS free tier.

In a terminal, go into the folder where you created main.tf, and run the “terraform plan” command:

> terraform plan Refreshing Terraform state in-memory prior to plan...(...)+ aws_instance.example ami: "ami-2d39803a" availability_zone: "<computed>" ebs_block_device.#: "<computed>" ephemeral_block_device.#: "<computed>" instance_state: "<computed>" instance_type: "t2.micro" key_name: "<computed>" network_interface_id: "<computed>" placement_group: "<computed>" private_dns: "<computed>" private_ip: "<computed>" public_dns: "<computed>" public_ip: "<computed>" root_block_device.#: "<computed>" security_groups.#: "<computed>" source_dest_check: "true" subnet_id: "<computed>" tenancy: "<computed>" vpc_security_group_ids.#: "<computed>"Plan: 1 to add, 0 to change, 0 to destroy.

The plan command lets you see what Terraform will do before actually doing it. This is a great way to sanity-check your changes before unleashing them onto the world. The output of the plan command is a little like the output of the diff command: resources with a plus sign (+) are going to be created, resources with a minus sign (-) are going to be deleted, and resources with a tilde sign (~) are going to be modified. In the output above, you can see that Terraform is planning on creating a single EC2 Instance and nothing else, which is exactly what we want.

To actually create the instance, run the “terraform apply” command:

> terraform apply aws_instance.example: Creating... ami: "" => "ami-2d39803a" availability_zone: "" => "<computed>" ebs_block_device.#: "" => "<computed>" ephemeral_block_device.#: "" => "<computed>" instance_state: "" => "<computed>" instance_type: "" => "t2.micro" key_name: "" => "<computed>" network_interface_id: "" => "<computed>" placement_group: "" => "<computed>" private_dns: "" => "<computed>" private_ip: "" => "<computed>" public_dns: "" => "<computed>" public_ip: "" => "<computed>" root_block_device.#: "" => "<computed>" security_groups.#: "" => "<computed>" source_dest_check: "" => "true" subnet_id: "" => "<computed>" tenancy: "" => "<computed>" vpc_security_group_ids.#: "" => "<computed>" aws_instance.example: Still creating... (10s elapsed) aws_instance.example: Still creating... (20s elapsed) aws_instance.example: Creation completeApply complete! Resources: 1 added, 0 changed, 0 destroyed.

Congrats, you’ve just deployed a server with Terraform! To verify this, you can log in to the EC2 console, and you’ll see something like this:

It’s working, but it’s not the most exciting example. For one thing, the Instance doesn’t have a name. To add one, you can add a tag to the EC2 instance:

resource "aws_instance" "example" { ami = "ami-2d39803a" instance_type = "t2.micro"tags { Name = "terraform-example" } }

Run the plan command again to see what this would do:

> terraform planaws_instance.example: Refreshing state... (ID: i-6a7c545b)(...)~ aws_instance.example tags.%: "0" => "1" tags.Name: "" => "terraform-example"Plan: 0 to add, 1 to change, 0 to destroy.

Terraform keeps track of all the resources it already created for this set of templates, so it knows your EC2 Instance already exists (note how Terraform says “Refreshing state…” when you run the plan command), and it can show you a diff between what’s currently deployed and what’s in your Terraform code (this is one of the advantages of using a declarative language over a procedural one). The diff above shows that Terraform wants to create a single tag called “Name”, which is exactly what we want, so you should run the “apply” command again. When you refresh your EC2 console, you’ll see:

Deploy a single web server

The next step is to run a web server on this Instance. In a real-world use case, you’d probably install a full-featured web framework like Ruby on Rails or Django, but to keep this example simple, we’re going to run a dirt-simple web server that always returns the text “Hello, World” using a code borrowed from the big list of HTTP static server one-liners:

#!/bin/bash echo "Hello, World" > index.html nohup busybox httpd -f -p 8080 &

This is a bash script that writes the text “Hello, World” into index.html and runs a web server on port 8080 using busybox (which is installed by default on Ubuntu) to serve that file at the URL “/”. We wrap the busy box command with nohup to ensure the web server keeps running even after this script exits and put an “&” at the end of the command so the webserver runs in a background process so the script actually can exit rather than being blocked forever by the webserver.

How do you get the EC2 Instance to run this script? Normally, instead of using an empty Ubuntu AMI, you would use a tool like Packer to create a custom AMI that has the web server installed on it. But again, in the interest of keeping this example simple, we’re going to run the script above as part of the EC2 Instance’s User Data, which AWS will execute when the instance is booting:

resource "aws_instance" "example" { ami = "ami-2d39803a" instance_type = "t2.micro" user_data = <<-EOF #!/bin/bash echo "Hello, World" > index.html nohup busybox httpd -f -p 8080 & EOFtags { Name = "terraform-example" } }

The “<<-EOF” and “EOF” are Terraform’s heredoc syntax, which allows you to create multiline strings without having to put “\n” all over the place (learn more about Terraform syntax here).

You need to do one more thing before this webserver works. By default, AWS does not allow any incoming or outgoing traffic from an EC2 Instance. To allow the EC2 Instance to receive traffic on port 8080, you need to create a security group:

resource "aws_security_group" "instance" { name = "terraform-example-instance"ingress { from_port = 8080 to_port = 8080 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] } }

The code above creates a new resource called aws_security_group (notice how all resources for the AWS provider start with “aws_”) and specifies that this group allows incoming TCP requests on port 8080 from the CIDR block 0.0.0.0/0. CIDR blocks are a concise way to specify IP address ranges. For example, a CIDR block of 10.0.0.0/24 represents all IP addresses between 10.0.0.0 and 10.0.0.255. The CIDR block 0.0.0.0/0 is an IP address range that includes all possible IP addresses, so the security group above allows incoming requests on port 8080 from any IP.

Note that in the security group above, we copied & pasted port 8080. To keep your code DRY and to make it easy to configure the code, Terraform allows you to define input variables:

variable "server_port" { description = "The port the server will use for HTTP requests" }

You can use this variable in your security group via Terraform’s interpolation syntax:

from_port = "${var.server_port}" to_port = "${var.server_port}"

You can also use the same syntax in the user_data of the EC2 Instance:

nohup busybox httpd -f -p "${var.server_port}" &

If you now run the plan or apply the command, Terraform will prompt you to enter a value for the server_port variable:

> terraform plan var.server_port The port the server will use for HTTP requestsEnter a value: 8080

Another way to provide a value for the variable is to use the “-var” command-line option:

> terraform plan -var server_port="8080"

If you don’t want to enter the port manually every time, you can specify a default value as part of the variable declaration (note that this default can still be overridden via the “-var” command-line option):

variable "server_port" { description = "The port the server will use for HTTP requests" default = 8080 }

One last thing to do: you need to tell the EC2 Instance to actually use the new security group. To do that, you need to pass the ID of the security group into the vpc_security_group_ids parameter of the aws_instance resource. How do you get this ID?

In Terraform, every resource has attributes that you can reference using the same syntax as interpolation. You can find the list of attributes in the documentation for each resource. For example, the aws_security_group attributes include the ID of the security group, which you can reference in the EC2 Instance as follows:

vpc_security_group_ids = ["${aws_security_group.instance.id}"]

The syntax is “${TYPE.NAME.ATTRIBUTE}”. When one resource references another resource, you create an implicit dependency. Terraform parses these dependencies, builds a dependency graph from them and uses that to automatically figure out in what order it should create resources (e.g. Terraform knows it needs to create the security group before using it with the EC2 Instance). In fact, Terraform will create as many resources in parallel as it can, which means it is very fast at applying your changes. That’s the beauty of a declarative language: you just specify what you want and Terraform figures out the most efficient way to make it happen.

If you run the plan command, you’ll see that Terraform wants to replace the original EC2 Instance with a new one that has the new user data (the “-/+” means “replace”) and to add a security group:

> terraform plan(...)-/+ aws_instance.example ami: "ami-2d39803a" => "ami-2d39803a" instance_state: "running" => "<computed>" instance_type: "t2.micro" => "t2.micro" security_groups.#: "0" => "<computed>" vpc_security_group_ids.#: "1" => "<computed>"(...)+ aws_security_group.instance description: "Managed by Terraform" egress.#: "<computed>" ingress.#: "1" ingress.516175195.cidr_blocks.#: "1" ingress.516175195.cidr_blocks.0: "0.0.0.0/0" ingress.516175195.from_port: "8080" ingress.516175195.protocol: "tcp" ingress.516175195.security_groups.#: "0" ingress.516175195.self: "false" ingress.516175195.to_port: "8080" owner_id: "<computed>" vpc_id: "<computed>"Plan: 2 to add, 0 to change, 1 to destroy.

This is exactly what we want, so run the apply command again and you’ll see your new EC2 Instance deploying:

In the description panel at the bottom of the screen, you’ll also see the public IP address of this EC2 Instance. Give it a minute or two to boot up and then try to curl this IP at port 8080:

> curl http://<EC2_INSTANCE_PUBLIC_IP>:8080 Hello, World

Yay, a working webserver! However, having to manually poke around the EC2 console to find this IP address is no fun. Fortunately, you can do better by specifying an output variable:

output "public_ip" { value = "${aws_instance.example.public_ip}" }

We’re using the interpolation syntax again to reference the public_ip attribute of the aws_instance resource. If you run the apply command again, Terraform will not apply any changes (since you haven’t changed any resources), but it’ll show you the new output:

> terraform apply aws_security_group.instance: Refreshing state... (ID: sg-db91dba1) aws_instance.example: Refreshing state... (ID: i-61744350)Apply complete! Resources: 0 added, 0 changed, 0 destroyed.Outputs:public_ip = 54.174.13.5

Input and output variables are a big part of what makes Terraform powerfully, especially when combined with modules, a topic we’ll discuss in Part 4, How to create reusable infrastructure with Terraform modules.

Deploy a cluster of web servers

Running a single server is a good start, but in the real world, a single server is a single point of failure. If that server crashes, or if it becomes overwhelmed by too much traffic, users can no longer access your site. The solution is to run a cluster of servers, routing around servers that go down, and adjusting the size of the cluster up or down based on traffic (for more info, check out A Comprehensive Guide to Building a Scalable Web App on Amazon Web Services).

Managing such a cluster manually is a lot of work. Fortunately, you can let AWS take care of it by you using an Auto Scaling Group (ASG). An ASG can automatically launch a cluster of EC2 Instances, monitor their health, automatically restart failed nodes, and adjust the size of the cluster in response to demand.

The first step in creating an ASG is to create a launch configuration, which specifies how to configure each EC2 Instance in the ASG. From deploying the single EC2 Instance earlier, you already know exactly how to configure it, and you can reuse almost exactly the same parameters in the aws_launch_configuration resource:

resource "aws_launch_configuration" "example" { image_id = "ami-2d39803a" instance_type = "t2.micro" security_groups = ["${aws_security_group.instance.id}"]user_data = <<-EOF #!/bin/bash echo "Hello, World" > index.html nohup busybox httpd -f -p "${var.server_port}" & EOFlifecycle { create_before_destroy = true } }

The only new addition is the lifecycle block, which is required for using a launch configuration with an ASG. You can add a lifecycle block to any Terraform resource to customize its lifecycle behavior. One of the available lifecycle settings is create_before_destroy, which tells Terraform to always create a replacement resource before destroying an original (e.g. when replacing an EC2 Instance, always create the new Instance before deleting the old one).

The catch with the create_before_destroy parameter is that if you set it to true on resource X, you also have to set it to true on every resource that X depends on. In the case of the launch configuration, that means you need to set create_before_destroy to true on the security group:

resource "aws_security_group" "instance" { name = "terraform-example-instance"ingress { from_port = "${var.server_port}" to_port = "${var.server_port}" protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] }lifecycle { create_before_destroy = true } }

Now you can create the ASG itself using the aws_autoscaling_group resource:

resource "aws_autoscaling_group" "example" { launch_configuration = "${aws_launch_configuration.example.id}"min_size = 2 max_size = 10tag { key = "Name" value = "terraform-asg-example" propagate_at_launch = true } }

This ASG will run between 2 and 10 EC2 Instances (defaulting to 2 for the initial launch), each tagged with the name “terraform-example.” The configuration of each EC2 Instance is determined by the launch configuration that you created earlier, which we reference using Terraform’s interpolation syntax.

To make this ASG work, you need to specify one more parameter: availability_zones. This parameter specifies into which availability zones(AZs) the EC2 Instances should be deployed. Each AZ represents an isolated AWS data center, so by deploying your Instances across multiple AZs, you ensure that your service can keep running even if some of the AZs fail. You could hard-code the list of AZs (e.g. set it to [“us-east-1a”, “us-east-1b”]), but each AWS account has access to a slightly different set of AZs, so you can use the aws_availability_zones data source to fetch the exactly list for your account:

data "aws_availability_zones" "all" {}

A data source represents a piece of read-only information that is fetched from the provider (in this case, AWS) every time you run Terraform. In addition to availability zones, there are data sources to look up AMI IDs, IP address ranges, and the current user’s identity. Adding a data source to your Terraform templates does not create anything new; it’s just a way to retrieve dynamic data.

To use the data source, you reference it using the standard interpolation syntax:

resource "aws_autoscaling_group" "example" { launch_configuration = "${aws_launch_configuration.example.id}" availability_zones = ["${data.aws_availability_zones.all.names}"]min_size = 2 max_size = 10tag { key = "Name" value = "terraform-asg-example" propagate_at_launch = true } }

Deploy a load balancer

Before launching the ASG, there is one more problem to solve: now that you have many Instances, you need a load balancer to distributed traffic across all of them. Creating a load balancer that is highly available and scalable is a lot of work. Once again, you can let AWS take care of it for you by using an Elastic Load Balancer (ELB). To create an ELB with Terraform, you use the aws_elb resource:

resource "aws_elb" "example" { name = "terraform-asg-example" availability_zones = ["${data.aws_availability_zones.all.names}"] }

This creates an ELB that will work across all of the AZs in your account. Of course, the definition above doesn’t do much until you tell the ELB how to route requests. To do that, you add one or more “listeners” which specify what port the ELB should listen on and what port it should route the request to:

resource "aws_elb" "example" { name = "terraform-asg-example" security_groups = ["${aws_security_group.elb.id}"] availability_zones = ["${data.aws_availability_zones.all.names}"]listener { lb_port = 80 lb_protocol = "http" instance_port = "${var.server_port}" instance_protocol = "http" } }

In the code above, we are telling the ELB to receive HTTP requests on port 80 (the default port for HTTP) and to route them to the port used by the Instances in the ASG. Note that, by default, ELBs don’t allow any incoming or outgoing traffic (just like EC2 Instances), so you need to add a security group to explicitly allow incoming requests on port 80:

resource "aws_security_group" "elb" { name = "terraform-example-elb"ingress { from_port = 80 to_port = 80 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] } }

And now you need to tell the ELB to use this security group by adding the security_groups parameter:

resource "aws_elb" "example" { name = "terraform-asg-example" security_groups = ["${aws_security_group.elb.id}"] availability_zones = ["${data.aws_availability_zones.all.names}"]listener { lb_port = 80 lb_protocol = "http" instance_port = "${var.server_port}" instance_protocol = "http" } }

The ELB has one other nifty trick up its sleeve: it can periodically check the health of your EC2 Instances and, if an instance is unhealthy, it will automatically stop routing traffic to it. Let’s add an HTTP health check where the ELB will send an HTTP request every 30 seconds to the “/” URL of each of the EC2 Instances and only mark an Instance as healthy if a response with a 200 OK:

resource "aws_elb" "example" { name = "terraform-asg-example" security_groups = ["${aws_security_group.elb.id}"] availability_zones = ["${data.aws_availability_zones.all.names}"]health_check { healthy_threshold = 2 unhealthy_threshold = 2 timeout = 3 interval = 30 target = "HTTP:${var.server_port}/" }listener { lb_port = 80 lb_protocol = "http" instance_port = "${var.server_port}" instance_protocol = "http" } }

To allow these health check requests, you need to modify the ELB’s security group to allow outbound requests:

resource "aws_security_group" "elb" { name = "terraform-example-elb"egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] }ingress { from_port = 80 to_port = 80 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] } }

How does the ELB know which EC2 Instances to send requests to? You can attach a static list of EC2 Instances to an ELB using the ELB’s instances parameter, but with an ASG, instances will be launching and terminating dynamically all the time, so that won’t work. Instead, you can use the load_balancers parameter of the aws_autoscaling_group resource to tell the ASG to register each Instance in the ELB when that instance is booting:

resource "aws_autoscaling_group" "example" { launch_configuration = "${aws_launch_configuration.example.id}" availability_zones = ["${data.aws_availability_zones.all.names}"]min_size = 2 max_size = 10load_balancers = ["${aws_elb.example.name}"] health_check_type = "ELB"tag { key = "Name" value = "terraform-asg-example" propagate_at_launch = true } }

Notice that we’ve also configured the health_check_type for the ASG to “ELB”. This tells the ASG to use the ELB’s health check to determine if an Instance is healthy or not and to automatically restart Instances if the ELB reports them as unhealthy.

One last thing to do before deploying the load balancer: let’s add its DNS name as an output so it’s easier to test if things are working:

output "elb_dns_name" { value = "${aws_elb.example.dns_name}" }

Run the plan command to verify your changes, and if everything looks good, run apply. When apply completes, you should see the elb_dns_name output:

Outputs:elb_dns_name = terraform-asg-example-123.us-east-1.elb.amazonaws.com

Copy this URL down. It’ll take a couple of minutes for the Instances to boot and show up as healthy in the ELB. In the meantime, you can inspect what you’ve deployed. Open up the ASG section of the EC2 console, and you should see that the ASG has been created:

If you switch over to the Instances tab, you’ll see the two instances in the process of launching:

And finally, if you switch over to the Load Balancers tab, you’ll see your ELB:

Wait for the “Status” indicator to say “2 of 2 instances in service.” This typically takes 1–2 minutes. Once you see it, test the elb_dns_name output you copied earlier:

> curl http://<elb_dns_name> Hello, World

Success! The ELB is routing traffic to your EC2 Instances. Each time you hit the URL, it’ll pick a different Instance to handle the request. You now have a fully working cluster of web servers! As a reminder, the complete sample code for the example above is available at https://github.com/gruntwork-io/intro-to-terraform.

At this point, you can see how your cluster responds to firing up new Instances or shutting down old ones. For example, go to the Instances tab, and terminate one of the Instances by selecting its checkbox, selecting the “Actions” button at the top, and setting the “Instance State” to “Terminate.” Continue to test the ELB URL and you should get a “200 OK” for each request, even while terminating an Instance, as the ELB will automatically detect that the Instance is down and stop routing to it. Even more interestingly, a short time after the instance shuts down, the ASG will detect that fewer than 2 Instances are running, and automatically launch a new one to replace it (self-healing!). You can also see how the ASG resizes itself by changing the min_size and max_size parameters or adding a desired_size parameter to your Terraform code.

Of course, there are many other aspects to an ASG that we have not covered here. For a real deployment, you would need to attach IAM roles to the EC2 Instances, set up a mechanism to update the EC2 Instances in the ASG with zero downtime, and configure auto-scaling policies to adjust the size of the ASG in response to load. For a fully pre-assembled, battle-tested, documented, production-ready version of the ASG, as well as other types of infrastructure such as Docker clusters, relational databases, VPCs, and more.

Clean up

When you’re done experimenting with Terraform, it’s a good idea to remove all the resources you created so AWS doesn’t charge you for them. Since Terraform keeps track of what resources you created, cleanup is a breeze. All you need to do is run the destroy command:

terraform destroy Do you really want to destroy? Terraform will delete all your managed infrastructure. There is no undo. Only 'yes' will be accepted to confirm.Enter a value:

variables.tf

Once you type in “yes” and hit enter, Terraform will build the dependency graph and delete all the resources in the right order, using as much parallelism as possible. In about a minute, your AWS account should be clean again.

One of the examples for creating AWS resources using terraform code:

vpc.tf

variable “amis” {
description = “Base AMI to launch the instances”
default = {
ap-southeast-1 = “ami-83a713e0”
ap-southeast-2 = “ami-83a713e0”
}
}

Let us define VPC with CIDR block of 10.0.0.0/16

gateway.tf

resource “aws_vpc” “default” {
cidr_block = “${var.vpc_cidr}”
enable_dns_hostnames = true
tags {
Name = “terraform-aws-vpc”
}
}

Define the gateway

public.tf

resource “aws_internet_gateway” “default” {
vpc_id = “${aws_vpc.default.id}” tags {
Name = “linoxide gw”
}
}

Define public subnet with CIDR 10.0.1.0/24

private.tf

resource “aws_subnet” “public-subnet-in-ap-southeast-1” {
vpc_id = “${aws_vpc.default.id}”
cidr_block = “${var.public_subnet_cidr}”
availability_zone = “ap-southeast-1a”
}}

Define private subnet with CIDR 10.0.2.0/24

route.tf

resource “aws_subnet” “private-subnet-ap-southeast-1” {
vpc_id = “${aws_vpc.default.id}”
cidr_block = “${var.private_subnet_cidr}”
availability_zone = “ap-southeast-1a”
}}

Route table for public/private subnet

natsg.tf

resource “aws_route_table” “public-subnet-in-ap-southeast-1” {
vpc_id = “${aws_vpc.default.id}”
resource “aws_route_table” “private-subnet-in-ap-southeast-1” {
vpc_id = “${aws_vpc.default.id}”
}}

Define NAT security group

websg.tf

resource “aws_security_group” “nat” {
name = “vpc_nat”
description = “NAT security group”
ingress {
from_port = 80
to_port = 80
protocol = “tcp”
cidr_blocks = [“${var.private_subnet_cidr}”] }
ingress {
from_port = 443
to_port = 443
protocol = “tcp”
cidr_blocks = [“${var.private_subnet_cidr}”] }
ingress {
from_port = 22
to_port = 22
protocol = “tcp”
cidr_blocks = [“0.0.0.0/0”] }
ingress {
from_port = -1
to_port = -1
protocol = “icmp”
cidr_blocks = [“0.0.0.0/0”] }
egress {
from_port = 80
to_port = 80
protocol = “tcp”
cidr_blocks = [“0.0.0.0/0”] }
egress {
from_port = 443
to_port = 443
protocol = “tcp”
cidr_blocks = [“0.0.0.0/0”] }
egress {
from_port = 22
to_port = 22
protocol = “tcp”
cidr_blocks = [“${var.vpc_cidr}”] }
egress {
from_port = -1
to_port = -1
protocol = “icmp”
cidr_blocks = [“0.0.0.0/0”] }
vpc_id = “${aws_vpc.default.id}”
}}

Define security group for Web

dbsg.tf

resource “aws_security_group” “web” {
name = “vpc_web”
description = “Accept incoming connections.”
ingress {
from_port = 80
to_port = 80
protocol = “tcp”
cidr_blocks = [“0.0.0.0/0”] }
ingress {
from_port = 443
to_port = 443
protocol = “tcp”
cidr_blocks = [“0.0.0.0/0”] }
ingress {
from_port = -1
to_port = -1
protocol = “icmp”
cidr_blocks = [“0.0.0.0/0”] }
egress {
from_port = 3306
to_port = 3306
protocol = “tcp”
cidr_blocks = [“${var.private_subnet_cidr}”] }
vpc_id = “${aws_vpc.default.id}”
}}

Define security group for the database in private subnet

webserver.tf

resource “aws_security_group” “db” {
name = “vpc_db”
description = “Accept incoming database connections.”
ingress {
from_port = 3306
to_port = 3306
protocol = “tcp”
security_groups = [“${aws_security_group.web.id}”] }
ingress {
from_port = 22
to_port = 22
protocol = “tcp”
cidr_blocks = [“${var.vpc_cidr}”] }
ingress {
from_port = -1
to_port = -1
protocol = “icmp”
cidr_blocks = [“${var.vpc_cidr}”] }
egress {
from_port = 80
to_port = 80
protocol = “tcp”
cidr_blocks = [“0.0.0.0/0”] }
egress {
from_port = 443
to_port = 443
protocol = “tcp”
cidr_blocks = [“0.0.0.0/0”] }
vpc_id = “${aws_vpc.default.id}”
}}

Define web-server instance

dbinstance.tf

resource “aws_instance” “web-1” {
ami = “${lookup(var.amis, var.region)}”
availability_zone = “ap-southeast-1a”
instance_type = “t2.micro”
key_name = “${var.key_name}”
vpc_security_group_ids = [“${aws_security_group.web.id}”] subnet_id = “${aws_subnet.public-subnet-in-ap-southeast-1.id}”
associate_public_ip_address = true
source_dest_check = false
}}

Define DB instance

natinstance.tf

resource “aws_instance” “db-1” {
ami = “${lookup(var.amis, var.region)}”
availability_zone = “ap-southeast-1a”
instance_type = “t2.micro”
key_name = “${var.key_name}”
vpc_security_group_ids = [“${aws_security_group.db.id}”] subnet_id = “${aws_subnet.private-subnet-in-ap-southeast-1.id}”
source_dest_check = false
}}

Define NAT instance

eip.tf

resource “aws_instance” “nat” {
ami = “ami-1a9dac48” # this is a special ami preconfigured to do NAT
availability_zone = “ap-southeast-1a”
instance_type = “t2.micro”
key_name = “${var.key_name}”
vpc_security_group_ids = [“${aws_security_group.nat.id}”] subnet_id = “${aws_subnet.public-subnet-in-ap-southeast-1.id}”
associate_public_ip_address = true
source_dest_check = false
}}

Allocate EIP for NAT and Web instance

resource “aws_eip” “nat” {
instance = “${aws_instance.nat.id}”
vpc = true
}
resource “aws_eip” “web-1” {
instance = “${aws_instance.web-1.id}”
vpc = true
}

Execute the terraform plan first to find out what terraform will do. You can also make a final recheck of your infrastructure before executing terraform apply