Building a Scalable API in the AWS Cloud

We've been running our Node.js based API in the AWS cloud for a couple of years but have been wanting to scale way up. Rather than trying to beef up our existing VPC (vertically scale) I made the decision to architect a new solution and rebuild it from the ground up. In this post I will talk about some of the technologies we leveraged, challenges we ran into, and benefits we have seen already.

As you probably guessed from the image above I saw this as a great opportunity to try Elastic Beanstalk. Elastic Beanstalk is what is known as a PaaS (platform as a service) and it acts a container for your applications and their respective environments. After creating an "application" you define a set of parameters corresponding to AWS services like EC2, RDS, ELB, etc and boom EB takes care of creating compute instances, database servers, load balancers, and more! And yes, I know what you are thinking and it is just as magical as it sounds. As a part of this configuration you can setup autoscaling triggers which will scale your fleet of applications up/down based on the metrics you set. It was an absolute joy doing load testing and watching the fleet respond like an accordion based on the amount of CPU being used.

The reason I went with this approach is because I know that we have a very steady load that will only grow based on the number of machines we have connected but on top of that we have a number of highly variable customers which could cause huge spikes for short to medium durations. In the "olden days" we were forced to pay for servers which could not only handle the peak load but also the forecasted load based on our growth. Today with solutions like Elastic Beanstalk and autoscaling you are only paying for the fleet needed to support the current load and not the worst case scenario load. Clearly this is huge for the bottom line.

For this particular application we are leveraging the following AWS services:

  • Elastic Beanstalk
  • CloudFront
  • CloudWatch
  • EC2
  • ElastiCache
  • ELB
  • RDS
  • Route 53
  • S3
  • Trusted Advisor
  • VPC

A really neat aspect of using Elastic Beanstalk is the not only the ability to monitor any component of your environment (via CloudWatch) but to also see a summary of things like CPU utilization and network I/O across your entire fleet.

The only major issue I had with this architecture was with WebSockets. Our mobile application uses Socket.IO for all communication between the client/server and we kept seeing 400 errors on the socket connection. After a lot reading and chatting with AWS engineers we came to find that the proxying being handled by the ELB + Nginx was causing the handshaking to fail on the socket connection. The fix (take note, it could save you a major headache) was to change the ELB configuration (from the EB settings) from HTTP/HTTPS listeners to TCP on port 80 and SSL on 443. The combination of that along with removing Nginx from the stack solved all of the socket issues and even with hundreds of devices connected to dozens of servers in the fleet everything stays in sync.

In closing, we couldn't have landed on a better method for keeping our API scalable, even if we spent a lot more dough. The peace of mind that comes from knowing that the environment will scale up when it needs to without any human intervention is near priceless.

Securing WordPress Admin

In today's online world security is a top concern. For WordPress maintainers keeping your installation, themes, and plugins updated is an essential first step but there are many other things that should be taken into concern. Today I am curious to get your feedback specifically on locking down the admin section of your WordPress site and after collecting those results I will throw together a post with some links to some really great material on locking things down in a more generalized fashion.

Now as far as the admin section goes, I use a server-side technique that requires both a password an me being at a specific IP address to access the wp-admin section. That may be too strict for most, particularly those on mobile so here is a little snippet that will force Apache to require a user name and password (all of which are stored server-side) before even displaying the page.

*** Update: A great discussion resulted on Facebook as a result of sharing this post and I learned something extremely useful. There is a Google Authenticator WordPress Plugin which enables two-factor authentication via the Google Authenticator app/service. I already use this technology for a variety of other sites/services so naturally it was a no brainer to use it for WordPress.

Here is some important information to keep in mind when using this plugin from Justin Dessonville:

"The key is to make sure you have your wordpress general settings timezone set to whatever time zone your phone is in. I'm not sure how this would work if you travel across time zones a lot, but it could potentially lock you out if it's not setup right. Regardless, when both your phone & wordpress instance are set to the same time zone it's been solid for me."

To re-cap: after the update, I am still using Apache/htpasswd to protect the /wp-admin part of the site, but I have removed the IP checking in favor of the 2-factor authentication. I feel just as secure (if not more) and I don't have to worry about tunneling in via VPN to satisfy the IP check.

Hosting for Business and Pleasure

In the world of web hosting there are so many options it's dizzying. Each boasting that they are faster, cheaper, and easier to setup than the next. The purpose of this post is definitely not to wade through all of the murky waters that is cloud-based web hosting but rather to shine some light on three hosts that I have a fair amount of experience with. I am going to briefly talk about some of the pros/cons as well as what I would recommend as the general use case for each of the three.

AWS

Obviously Amazon Web Services (AWS) has a huge presence in the hosting space and rightfully so. They are top notch when it comes to scalability, customization, features, and in a lot of cases pricing. With that said, why not go with them for everything? Personally, I think they are overkill for things like brochure/marketing sites, personal apps/blogs, etc. Now when it comes to scalable, distributed, high availability web applications, AWS is my top choice. I have been a part of two major migrations from other hosting platforms to AWS in the last 60 days and I can say from experience they provide some of the best platforms/tools in the business. I actually have a talk submitted for an upcoming conference that focuses on one of these migrations so I am going to hold off on going too much deeper into AWS in this post. What I will say is that we were able to leverage all of the following services for a single web application:

Digital Ocean

Touted as the "SSD-Only Cloud", Digital Ocean provides a slick interface to manage servers/keys/etc, and also has a great API. Beyond that it has one-click installations for the following apps/stacks:

In my mind, Digital Ocean is the best host for personal blogs/sites. You can scale the server up (vertically) at any time and you just pay for what you use. This blog is currently running on Digital Ocean and I pay next to nothing for hosting. Furthermore, I have full SSH access into my box(es) so I can tweak/customize to my hearts content. Getting a server (or in Digital Ocean terms, a "Droplet") up and running really couldn't be easier. If you are new to hosting or are just interested in focusing on development/publishing content rather than managing your server, Digital Ocean is a great option. If you do decide to give Digital Ocean a shot please feel free to sign up using my referral link, I would certainly appreciate it!

Linode

This site (and many others done by @MitchellHislop and Myself) have co-existed on the same Linode cluster for years. I have nothing but good things to say about them, they provide great support, offer a variety of SSD-based linux varieties, and a bunch of other services. For a large web application that got a lot of traffic during the holidays (particularly after being featured on the news in two states) we were able to use "NodeBalancers" to take care of routing traffic to our various nodes and could swap nodes in and out as needed. They definitely don't have the feature set or instant scalability that AWS does but they are certainly robust enough to run most apps. If you know your way around a Linux environment and have a general understanding of hosting, DNS, etc, Linode is a great place to call home.

Summary

I'd say the big take away here is that there is no perfect host. There are always going to be some give-ups whether it be functionality, ease of setup, or cost. If you are competing with the "big boys" in the web application space, then AWS is absolutely the right place to be. If you are looking to get a site up quickly and easily then Digital Ocean is the place for you. If you are a developer/hacker looking to spin up boxes for a variety of purposes that need at least some ability to scale in both directions, then Linode is a perfect fit.

As I mentioned at the beginning of this post, I have left out A LOT of major players (Rackspace, MediaTemple, countless others) but these are the ones I have experience with and I think they provide a nice representation of the different types of hosts out there which still provide with you with the basic needs like SSH, SFTP, etc.

I'd love to hear about your experiences in hosting, I know most people have strong opinions on the topic so please share your story/thoughts in the comments.