When it comes to learning new skills, sometimes increasing your knowledge and remaining self-motivated remains the easy part. Becoming aware of what you don't know, understanding the framework under which these new skills can be applied in an enterprise environment, and the humility to reach out to those who know more to understand their perspective and fill in the gaps can sometimes be even more important to level up as a technologist.
I've invested a lot of my time over the past year furthering my skills. I've become proficient in Python, immersed myself in the thought processes and tools driving modern DevOps thought, and dove head-first into AWS. So when I was talking with Zach, a Cloud Engineer at the company I work for about the best ways to advance my knowledge in the cloud, I walked away with 2 big takeaways:
The AWS console tries to abstract or hide decisions that the average user typically doesn't need to worry about away from them to create a good user experience for beginners. As a technologist, your job is to investigate those abstractions and understand why they matter and when those changes need to be made.
Serverless computing is the current Big Thing so a lot of attention and marketing is invested in it. However, traditional server-based computing models are not and will not be going away. 'The Cloud is just someone else's computer' has been a pithy putdown of cloud computing since its inception, but it's also factually true. Understanding 'the way things used to be' is and will always remain a great way to further your knowledge. (Hey, why is the default Windows install directory C:?)
With this knowledge whistling inside of my head, I knew I wanted to construct a project that would utilize AWS to host a web server. My project would live on this server and would have a persistent database that would interact with it. I bounced around a bunch of ideas until I finally figured out what I wanted to build.
The Project
To those who just want to check it out, click here. If you're more interested in the code, that is available on my GitHub profile here.
What I have built is an Online Guest Book application on an Amazon Linux 2 EC2 instance utilizing Apache Web Server (httpd) to serve the application over the internet. I constructed a web app in Flask that interfaces with an SQLite database. The infrastructure configuration process has been automated using Terraform, and the configuration management is handled via a bash script.
When users 'sign' the online guest book with their name and a brief message, they will see their message added to the list in real-time, as well as all the messages left previously.
If You Give a Mouse an EC2 Instance...
While AWS offers multiple ways to host compute power (an example that I didn't go with was AWS Lightsail, which is easier to configure but gives you less control over the configuration of the end product), I opted for the more options-rich EC2. While standing up an EC2 instance in and of itself is not that difficult, creating an EC2 instance that is equipped to serve as a web server and is ready and able to have traffic routed to it is a little more complicated. Because...
if you give a mouse an EC2 instance, he's going to need a Virtual Private Cloud for it to reside in.
When you give him the Virtual Private Cloud, he'll need a subnet to handle addressing...
When you give him the subnet to handle addressing, he'll need a routing table to route traffic outside of the VPC...
And if you give him the routing table, he'll need a security group to be able to properly authorize traffic.
And once you've done all that, he won't be satisfied until you give him a Route53 hosted zone to handle DNS!
While it's a much bigger project than it may appear at first glance, once you understand the full scope, creating the terraform modules to provision each component is relatively trivial (most of them are a dozen lines of code or less). In the past, I've created assets by hand to get the lay of the land, then created a parallel architecture in terraform, but I felt like that was very inefficient. So when I was working on this project, I wanted to try building assets for the first time in terraform. It requires a better understanding of the infrastructure you're building, but I think it's by far the best way to work.
Stacking it up
Once the EC2 instance was spun up, the next step was to configure the software stack that we would use to build, host, and display our web app. After doing some research, I settled on the following stack to utilize for this project:
Amazon Linux 2
Apache Web Server/httpd
SQLite3
Python/Flask
Amazon Linux 2
Amazon Linux 2 is AWS's proprietary Linux distro. I opted for it because most of my experience up to this point was in Debian-based Linux distros such as Ubuntu, whereas Amazon Linux is based on RHEL. While there aren't very many differences, I think it was a good experience to work in and become comfortable with yet another flavor of Linux. One of the biggest differences is while apt still has the apache
package, yum, the package manager used by RHEL and by extension Amazon Linux 2, calls it httpd
, which is just indicative of Apache's extensive history as a web server solution.
Apache web server
Apache is an open source web server with a long history (it's been around since 1995!). In addition to having big investment from the industry(bloomberg.com and bbc.co.uk are both hosted via Apache, among others), Apache has a long history of support, so there's a wealth of documentation to help you resolve issues. On Amazon Linux 2 you can configure Apache as follows:
#!/bin/bash
sudo yum install httpd -y
#start apache server
sudo systemctl start httpd
#configure apache to automatically start on boot
sudo systemctl enable httpd
Once these commands have been executed, you can verify Apache is configured and working properly by navigating to the public IP of your EC2 instance. If you receive this splash screen, Apache is fully operational
Once Apache is functional, the final step is to move the content to be displayed to the /var/www/html
directory. To facilitate this, it will be best to grant a user access to move files to this directory (only the apache
user and user-group that the httpd service uses as well as the root user has access to modify the directory otherwise). To grant the ec2-user
access to make changes to the /var/www
directory:
#adds to the ec2-user user to the apache user group
sudo usermod -a -G apache ec2-user
#grant the ec2-user user write access to the /var/www folder
sudo chown -R ec2-user:apache /var/www
sudo chmod 2775 /var/www && find /var/www -type d -exec sudo chmod 2775 {} \;
find /var/www -type f -exec sudo chmod 0664 {} \;
Once this has been done, we can add content to the /var/www/html
directory using the ec2-user
user, and it will be displayed instead of the Apache splash screen, which means we are ready to move on to creating the web app
SQLite3
SQLite3 is a compelling alternative to more robust RDBMSes such as MySQL or PostgreSQL for small-scale projects such as these. As opposed to creating a full database which requires ODBC connections and drivers, it creates a SQL database contained in a single file that can be hosted on your web server. While there are certainly limitations (since it's a file, multiple entities cannot be opening and writing to it simultaneously, precluding an architecture that involves load balancing across multiple web servers), for a back-end database to a web app, SQLite offers many benefits (The home page of the SQLite project relies solely on SQLite for its database needs)
Python/Flask
While I do have some experience working with Javascript, I'm much more comfortable using Python. So when it came time to decide on a back-end language, Python was a natural choice, and more specifically Flask. While Django is generally considered to be a more fully-featured Python web development framework, Flask is much more lightweight, only containing the bare essentials to get a project off the ground and allowing the user to install additional components as needed, which made it perfect for my needs.
Building the web app
With the software stack settled, I then broke the actual building of the app into several manageable pieces:
Create the database used to store/retrieve messages
Write the front end of the website
input form with 2 fields (end user's name/signature, as well as a message) that would write to the database on submission
a method to display messages from the database on the page for perusal
Write the Python scripts needed to allow the flask app to interface with the database
Building the database
Since I was going to use Python to create the front end, it seemed natural to me to create the SQLite database using Python as well. Luckily, Python has a sqlite3
library which is part of the standard library, and I ended up with this:
import sqlite3
#establish connection to SQL database, creates the database.db file if it does not already exist
conn = sqlite3.connect('/var/www/html/FlaskApp/database.db')
cursor = conn.cursor()
#executes a query against the SQLite table, creates table according to specifications
cursor.execute('''CREATE TABLE visitors (
name TEXT,
message TEXT
);
'''
)
#sets the name row as unique
cursor.execute('CREATE UNIQUE INDEX visitor_name ON visitors (name);')
conn.commit()
#append an initial value to previously existing table
cursor.execute('INSERT INTO visitors VALUES ("Matthew Ivancic","first");')
conn.commit()
In the early days of the internet, there was no person more revered than the 'first' commenter. Their ability to find an article mere seconds after it was released allowed them to be the first into the conversation. You may have said to yourself 'how did they read the article that quickly?!?!' or sworn to be faster next time in the hopes of being the one to comment 'first'. In this instance, you can sleep soundly knowing that there was never a real chance for you to be first (sorry!)
Building the front-end
I am not a front-end developer, which is maybe one of the larger benefits of working on full-stack personal projects like this. Bare minimum, I have to know enough HTML to lay out the basics of my website, and enough CSS to borrow a stylesheet that looks nice, and make it work well with my HTML, which furthers my knowledge in ways I might not have pursued on my own, making me a much more well-rounded developer. After some fussing and experimenting with how to create a form in HTML, I ended up with something like this, which means we're ready to tie it all together:
<form action="/" method="POST">
<label for="name">Your name:</label><br>
<input type="text" id="name" name="name"><br>
<label for="messsage">Your message:</label><br>
<input type="text" id="message" name="message"><br>
<p></p>
<button type="submit" class="btn btn-success">Sign the Guest Book!</button><br>
</form>
<br>
<hr>
<h5 align="center">Messages from our adoring fans</h5>
{% for i in guestbook %}
<li align="left">{{i}}</li>
{%endfor%}
</div>
</div>
</div>
Putting it all together
With a back-end database waiting for entries, as well as a front-end just looking for a database to send data to and from, it was time to play Computer Cupid and get them all set up, which looked a little like this:
from flask import Flask, render_template, request
import sqlite3
app = Flask(__name__)
@app.route('/', methods=['GET','POST'])
def index():
conn = sqlite3.connect('/var/www/html/FlaskApp/database.db')
cursor = conn.cursor()
#only activates if data is retreived via POST(i.e. data is submitted via the
#webpage; inserts data into database
if request.method == 'POST':
name = request.form['name']
message = request.form['message']
query = f'INSERT INTO visitors VALUES ("{name}","{message}")'
try:
cursor.execute(query)
conn.commit()
except:
pass
#queries all entries from database, displays them in a readable format
query = 'SELECT * FROM visitors;'
cursor.execute(query)
entries = cursor.fetchall()
#list comprehension that formats and displays all entries
guestbook = [f'"{x[1]}" - {x[0]}' for x in entries]
return render_template('index.html', guestbook = guestbook)
if __name__ == '__main__':
app.run(debug=True)
A couple of interesting problems I ran into tying everything together:
I originally had a second route that would be called when you submitted an entry to the book to display. Unfortunately, if you tried to submit again, I couldn't get it to work, so I instead opted for the one-page structure, which works well for my purposes as once you submit, it reloads the page and your entry will be reflected on the page. Overall, I think this solution is much more elegant than the original page setup.
Once the page was working and entries were both submitted and displaying properly, I discovered that reloading the page would resubmit the same info so there were duplicate entries in the database (this wasn't an issue with the two-page solution!). It seems that reloading the page was considered a POST by Flask, even though the end-user wasn't intending to submit an entry. I fixed this by creating a unique index for the name column in the table. So if you reload the page and therefore resubmit your message to the database, it will not be written, and will not be displayed.
Once these issues were worked through, we had a feature-complete flask app on the web server ready for deployment, which was our final challenge
WSGI in the jar
In addition to having its frameworks to develop web applications, Python also has its conventions and protocols to display them on the web: the Web Server Gateway Interface (WSGI). WSGI is a way for web requests to be forwarded to python-hosted web applications. Without it, the back-end scripts that Flask apps are designed to utilize do not function. Our application requires WSGI both to write to the database, as well as retrieve entries and dynamically display them on the webpage.
Fortunately for us, flask comes with a built-in WSGI server we can call with flask run
. Unfortunately for us, any time we call it this way, we receive a WARNING: This is a development server. Do not use it in a production deployment.
error message. flask run
is a great way to test your applications and make sure they are working as intended, but when it comes time to share them with the world, a higher-quality WSGI server is needed
To install a production-grade WSGI grade server on Amazon Linux 2 (or any RHEL-based distro):
sudo yum install python3-mod_wsgi.x86_64 -y
NOTE: This package is the version of WSGI compatible with Python3. There is another package, mod_wsgi
that is designed with Python2 in mind. If you utilize features unique to python3 such as list comprehensions or f-strings with this package installed, your app will not work correctly.
Once WSGI has been fully installed, it must be configured to work with the app in question. I accomplished this with a file called app.wsgi
located in the same directory as my flask app:
import sys, sqlite3
sys.path.insert(0, '/var/www/html/FlaskApp')
from app import app as application
Once WSGI was configured to recognize our app, the final step was to configure Apache to interface with WSGI to display our app via the public IP of the web server. I accomplished this by an apache config file, copied to the httpd configuration directory, /etc/httpd/conf.d
:
<VirtualHost *:80>
ServerName 18.222.20.3
ServerAdmin matthew.ivancic91@gmail.com
WSGIDaemonProcess flaskapp user=apache group=apache threads=5
WSGIScriptAlias / /var/www/html/FlaskApp/app.wsgi
<Directory /var/www/html/FlaskApp>
WSGIProcessGroup flaskapp
WSGIApplicationGroup %{GLOBAL}
Order deny,allow
Allow from all
</Directory>
ErrorLog /var/www/logs/error.log
</VirtualHost>
All of the documentation I read stated that the ServerName
should reflect the public IP of the web server, but I can confirm that that is not necessary. I suspect this is a convention that is important to adhere to when you are maintaining many web servers, each hosting its own application (or perhaps if you are load balancing many servers to support one app), but I believe this is strictly for organizational purposes.
On Ubuntu or other Debian-based distros, the WSGIDaemonProcess
will need to point to the user and group apache uses on those OSes, which is www-data
, as compared to apache
which is the value used on Amazon Linux.
Finally, if you configure an ErrorLog
directory as I did (which I strongly recommend, it was invaluable in troubleshooting errors), if the directory does not exist, WSGI will error out and will not display your app appropriately, although the error.log
file does not need to exist
Finally, once this configuration is set, the apache service will need to be restarted before changes will take effect sudo systemctl reload httpd
And once that was done, my site was online, displaying properly on the public internet and receiving and displaying data as it should!
Conclusion
Overall, I learned a ton working on this project. I had never worked with anything RDBMS before, so getting to interact with SQL was a delight and filled in a lot of gaps in my knowledge. I was also really happy that my studies in Python prepared me well for this project, and I was honestly very surprised at how frictionless writing the scripts and building the Flask app eventually were. Moving forward, I think my next efforts in AWS will either deal with ECS/Fargate, or will use the AWS CLI/saml2aws
to automate the process of delegating authority to a subdomain hosted zone.
If you've made it this far, I sincerely thank you for bearing with me, I hope reading this article filled in some holes for you and inspired you to keep working on your skills. Thanks for tuning in, and never stop coding.