Hewi's Blog

Traffic Shaping in Software Engineering

Amr Elhewy — Sat, 06 Apr 2024 00:31:31 GMT

Hello everyone! In this article I'm going to be talking about traffic shaping in software engineering. We're going to be talking through what it is, how it's achieved and some of its famous algorithms. Let's start

What is Traffic Shaping?

Traffic shaping, in simple terms, is like managing the flow of cars on a road to prevent congestion or accidents. In the context of software and web servers, it's about managing the flow of data (like requests and responses) to ensure smooth and efficient operation.

Traffic shaping is often used to manage the flow of data between users and the server. For example, if too many users are trying to access a website at once, traffic shaping can prioritize certain requests, like those from paying customers or those accessing critical services, while slowing down less important requests. This helps ensure that the server can handle the load without crashing or slowing down for everyone.

In general, it aims to achieve the following:

Preventing Network Congestion
By controlling the flow of traffic, traffic shaping ensures that network resources are utilized efficiently and that congestion-induced packet loss is minimized.
Prioritizing Critical Traffic
Traffic shaping allows network administrators to prioritize critical traffic over less important or non-critical traffic
Enhancing Quality of Service (QoS)
Traffic shaping plays a crucial role in enhancing Quality of Service (QoS) for users and applications. By controlling factors such as latency, jitter, and packet loss, traffic shaping helps maintain a consistent and predictable user experience
Optimizing Resource Allocation
By allocating network resources judiciously, traffic shaping helps optimize resource utilization and maximize the overall performance of the network. By dynamically adjusting bandwidth allocation based on traffic patterns and user requirements

Traffic Shaping Algorithms

Two of the most well known algorithms of traffic shaping (used in rate limiting too) are:

Token Bucket Algorithm
Leaky Bucket Algorithm
Weighted Fair Queueing (WFQ)
Hierarchical Token Bucket (HTB)

Many more algorithms exist, we'll dive into these for now..

Token Bucket Algorithm

The token bucket algorithm is a widely used traffic shaping algorithm that regulates the rate at which packets are transmitted based on available tokens in a token bucket.

Each packet sent consumes a certain number of tokens from the bucket. If there are no tokens available, packets are queued or dropped. Tokens are replenished at a fixed rate, representing the maximum allowable transmission rate.

This algorithm allows bursts of traffic to be transmitted at higher rates as long as there are sufficient tokens available in the bucket

Leaky Bucket Algorithm

A way to mitigate the bursts of traffic problem that the token bucket has is the leaky bucket algorithm.

The leaky bucket algorithm controls the rate of packet transmission by continuously leaking packets from a bucket at a fixed rate.

Incoming packets are buffered in the bucket until they can be transmitted. If the bucket is full, excess packets are discarded

This algorithm helps smooth out bursts of traffic by limiting the rate at which packets are released from the bucket.

Weighted Fair Queueing (WFQ)

Now comes the algorithms that deal with prioritizing requests over other requests.

WFQ is a scheduling algorithm used in routers and network switches to prioritize traffic based on assigned weights. It divides available bandwidth among different traffic flows proportionally to their weights, ensuring that each flow receives its fair share of bandwidth.

WFQ can help prevent individual flows from monopolizing bandwidth and ensures that low-priority flows do not starve for resources in the presence of high-priority traffic.

Hierarchical Token Bucket (HTB)

HTB is a more advanced version of the token bucket algorithm that allows for hierarchical shaping and prioritization of traffic.

It enables administrators to define multiple token buckets with different rates and burst sizes, allowing for more granular control over traffic shaping policies.

HTB organizes traffic into a hierarchical structure of token buckets, where each bucket represents a different level of prioritization or classification. This hierarchical approach enables fine-grained control over traffic shaping policies, allowing for prioritization based on various criteria such as source/destination IP addresses, ports, protocols, or application types.

Summary

Traffic shaping is a critical aspect of network management, enabling organizations to optimize network performance, manage bandwidth usage, and ensure reliable and responsive service delivery. By implementing effective traffic shaping strategies, organizations can enhance user experience, mitigate security risks, and improve overall network efficiency.

We've discussed some of the famous traffic shaping algorithms and there's definitely more out there. I thought I'd document myself learning about traffic shaping as well as share what I found. That's it for this one see you in the next article!

Everything you need to know about Contracts in Software Engineering

Amr Elhewy — Tue, 02 Apr 2024 13:36:43 GMT

Introduction

Hello everyone! hope everyone is having a productive year so far! In this article I'm going to be diving deep in contracts between services. By the end of the article we should have answers to the questions below:

What are contracts?
What are the types of contracts?
Contracts tips & tricks
Anti patterns with contracts

Each question will have its own segment so without further or do let's dive in.

What are Contracts?

Contracts are a written or spoken agreement, especially one concerning employment, sales or tenancy that is intended to be enforceable by law.

But what about in software engineering? Contracts are integration points in architecture and so many contract formats are part of the design process of software development; SOAP, REST, gRPC, XMLRPC are all examples of formats of contracts.

The definition can be re written to be:The format used by parts of an architecture to convey information or dependencies

Basically all the techniques used to "wire together" parts of a system.

Types of Contracts

There exists a spectrum of contract types but mainly three main types of contracts exist;

Strict Contracts
Less Strict Contracts
Loose Contracts

Strict Contracts

A strict contract requires adherence to names, types, ordering and all other details leaving no ambiguity. Many strict contract formats mimic the semantics of method calls. Remote Procedure Calls (RPC) is an example of a popular remote invocation framework that defaults to strict contracts.

Strict contracts are liked because they mimic semantics of method calls. However, they create brittleness in integration architecture. Something that is changing frequently and used by several distinct architecture parts creates problems in architecture.

Some examples of strict contracts include XML Schema, JSON Schema and RPC.

Some pros and cons of strict contract include:

Pros:

Guaranteed Contract Fidelity; Its always ensured that the contract adheres to the requirements.
Versioned; they generally require a versioning strategy to support two endpoints that accept different values or manage domain evolution over time.
Easier to verify at build time; Very important as it adds a level of type checking at build time
Better Documentation; the contracts provide the best documentation you'd need.

Cons:

Tight Coupling; If two services share the strict contract and it changes. The two services must change and adapt.
Versioned; mentioned here again because it could be a nightmare if the team doesn't have a clear deprecation strategy.

Less Strict Contracts

These exist in the middle of the spectrum. A perfect example for these are REST & GraphQL formats.

For REST, resources are modeled rather than a method or procedure endpoint making for less brittle contracts. For example if an architect builds a RESTful endpoint that describes parts of an airplane that supports queries about seats, that query won't break in the future if someone adds details about engines to the resource.

Similarly GraphQL is used by distributed architectures to provide read-only aggregated data rather than perform costly orchestration calls across a wide variety of services.

For example lets say we have the following Profile resource

Profile {idnamedobaddresscountry}

If we have 2 services interested in accessing Profile information but each require different values for example Service A reads name and dob and Service B reads address and country, GraphQL is a great option for this case where you only request what you need from the resource. An anti pattern we'll talk about later called stamp coupling is a case where you request the whole resource when you only need 1 or 2 fields potentially decreasing response time across the network.

Keeping the contracts at a "need to know" level strikes balance between semantic coupling and necessary information without creating needless fragility in integration architecture.

Loose Contracts

At the far end of the spectrum lies extremely loose contracts. Often expressed as name-value pairs in JSON or YAML.

Just name-value pairs and nothing else.

Using loose contracts allows for extremely decoupled systems. However the looseness of the contracts comes with trade offs such as lack of contract certainty, verification and increased application logic.

Here's a pros and cons list of loose contracts.

Pros:

Highly decoupled; which is very useful especially in micro services architecture.
Easier to evolve; It evolves freely because no schema information exists.

Cons:

Contract management; harder to manage because problems such as misspelled names, missing name-value pairs can exist. These problems are fixable by having schemas.
Requires Fitness Functions. These are architectural functions run at build time to ensure that the required and necessary information exists in the contract.

Contracts tips & tricks

Let's for example take the following payload as a loose contract.

{"name": "Martin","age": 24,"city": "New York"}

When this payload gets passed over the wire as JSON, it gets encoded (textual encoding) where it's processed as if it were a string as follows;

"{\"name\": \"Martin\", \"age\": 24, \"city\": \"New York\"}"

However there are other ways to potentially decreasing the size of the payload saving some bandwidth and increasing performance;

Using a binary encoding scheme; Binary encoding typically achieves smaller sizes compared to textual encoding by representing data more compactly and efficiently. For example, numeric values may be stored using fewer bytes, strings may be stored without delimiters, and repeated structures may be stored more efficiently. Binary encoding is generally very efficient in larger payloads. In small ones it might be costly more than normal textual encoding
Minification (shorthand notation); which aims to reduce the size of JSON payloads by using shorter key names while still maintaining readability and meaning. This technique is often used in situations where minimizing data size is critical, such as in network communication, storage optimization, or when working with large datasets. However, it's essential to balance the reduction in size with readability and maintainability. Minified keys should still be meaningful and understandable to developers who work with the data.
```
  {  "n":"Martin",  "a": 24,  "c": "New York"  }
```

Anti patterns with contracts

One of the most well known anti patterns is called "Stamp Coupling".

Stamp Coupling refers to passing large data structures between services but each service interacts with only a small part of the data structure. In other words, passing more than you really need to. Over specifying unneeded details can often lead to too much bandwidth consumption. Other problems can happen especially for strict contracts, if you specify a field that you probably didn't need and the contract changed (a change related to the field) then the service would break for no reason.

It's important to be careful and pass only what you need and not pass things just in case.

References

Software Architecture The Hard Parts Book (Chapter 13)
Designing Data Intensive Applications (Chapter 3)

Trying out Image Editing using OpenAI (Parody)

Amr Elhewy — Thu, 07 Mar 2024 18:03:57 GMT

Introduction

Hello everyone. In this article I'll be talking about a series of trials and errors I went through testing out OpenAI's DALLE-E 2 model for image editing. I'll be going through what I did, the final result and I'll be listing the pros and cons of this approach at the end. Let's dive in

OpenAI Image Editing

So to keep it short, OpenAI has this API that allows us to upload our own images and a prompt stating exactly what we want to achieve edit wise. Then what happens is OpenAI takes our image and applies the prompt (or at least I thought it did that lol) resulting in an edited image of what we wanted.

If we look at the documentation from the OpenAI website we'll see the following example

It takes 3 main inputs:

The original image
A duplicate of the original image but with a transparent mask specified in the area that you want edited. (Masking using anything other than a white fill won't work, for example a black mask won't work it HAS to be white)
A prompt, They stated in the documentation that it has to describe the image itself not the edited piece (masked part) only. This was painful to say the least.

Then what it would do was return a URL for the edited image(s).

So as any curious person would do, I thought I'd give it a go.

My Walkthrough

I had this picture of myself below. The picture was missing something so as I started thinking I realized what was missing.

It was missing a gravestone with the words 'EGP' written on it. Basically if you're not from Egypt or you've been living under a rock the egyptian currency (not even worth using caps lock for it) got devaluated again a few days ago making a single US dollar worth almost 50 egyptian pounds.

So that's exactly what I wanted to add to this image. I'll leave you to what I did (stay tuned for the result at the end)

Writing the code

First thing I did was copy paste the code from the OpenAI documentation

from openai import OpenAIclient = OpenAI(api_key="get ur api key")response = client.images.edit(  model="dall-e-2",  image=open("test_image.png", "rb"),  mask=open("mask.png", "rb"),  prompt="",  n=1,  size="1024x1024")image_url = response.data[0].url

I had the original image shown above and the masked image where I masked the area I was looking at.

I spent almost 3 hours, tried more than 60 attempts using different prompts to get this working and I would've never imagined it would be this hard. Bare in mind this isn't free I had to put in 5$ for this.

Prompting

Some of the prompts I wrote were as follows;

A person standing infront of a headstone, the headstone has the words 'RIP EGYPTIAN CURRENCY' on it
A person standing in front of a headstone, the headstone has the word 'RIP EGP' on it
A person standing in front of a tombstone, the tombstone has the word 'EGP' on it
A person standing in front of a GraveStone, the GraveStone has the word 'EGP' on it, please don't mess this up
A man standing in front of a gravestone with the letters 'E','G',P' written on it. The letters are in the order specified

I tried more than 30 other prompts and what I realized was it really has trouble writing on the gravestone. It's really good at editing a gravestone in but ends up writing gibberish on it. Below are some of the funniest outputs that I got

Are these.. legs?????

The one above's edit is really good. Idk what the text means though.

The one above gives me temple runner vibes

What exactly did I write for it to give me a broken disco ball?

Final output & Summary

I guess after all these attempts I got lucky with one that looks like this

I guess the trial and error time is equivalent to learning Adobe Photoshop from scratch but at least we got something we can work with.

To summarize maybe it wasn't the best pick for my case or what I wanted to do with it but I guess it might be useful in other cases. But it has trouble understanding specifics when you prompt it to do something like write or engrave. Maybe in a few years when it's better I'll be able to recreate this with a bigger grave let's see.

Using Vector Databases for Retrieval Augmented Generation

Amr Elhewy — Sat, 17 Feb 2024 19:51:47 GMT

Introduction

Hello everyone! Continuing our vector database exploration (make sure you read the last part linked here it's pretty cool 😮💨), today we're going to be expanding the LLM's knowledge base using retrieval augmented generation.

What we'll be doing is we're going to be creating a bot where we ask him about famous people, obviously the catch is an LLM probably knows who Di Caprio is but it doesn't know who I am for example (not for long though 😏).

We'll have a vector database for bios of people that aren't famous at all. When asking the bot about a certain non famous person, we'll gather semantics from our vector database for that person and if he exists in our database, we'll provide context to the LLM when asking about him. Let's dive in!

Retrieval Augmented Generation

Without getting too technical, retrieval augmented generation is nothing but referencing extra data outside of the LLM's training data. This data can help the LLM provide accurate results and avoid it 'hallucinating'

LLM's by themselves have known problems, these problems include the following:

Presenting false information when it does not have the answer.
Presenting out-of-date or generic information when the user expects a specific, current response.
Creating a response from non-authoritative sources.
Creating inaccurate responses due to terminology confusion, wherein different training sources use the same terminology to talk about different things.

So providing extra data or context when talking to the LLM helps improve the response.

Walkthrough

Setup

We'll be using the same codebase as our previous semantic search article and expanding on that.

We'll have a database table with 2 fields, text and embedding. Where the text is a text of the person's bio.

As a refresher, the two functions below are going to be used. The first one takes in an embedding and performs a semantic query against it. The second one just abstracts the first by embedding the query text and returning any related semantics.

func (mysql *MySQL) GetRelatedEmbeddings(embedding []byte) []string {    res, err := mysql.db.Query("SELECT text, dot_product(embedding, ?) as similarity FROM famous_people ORDER BY similarity DESC LIMIT 3", embedding)    if err != nil {        log.Fatal("Error querying database:", err)    }    var relatedEmbeddings []string    for res.Next() {        var text string        var similarity float32        err = res.Scan(&text, &similarity)        if err != nil {            log.Fatal()        }        relatedEmbeddings = append(relatedEmbeddings, text)    }    return relatedEmbeddings}func GatherContext(text string, dbclient *structs.MySQL, openAIClient *structs.OpenAIClient) []string {    embedding := openAIClient.GetEmbeddingForText(text)    result := dbclient.GetRelatedEmbeddings(convertFloatToByte(embedding.Embedding))    return result}

We created a new function GPT where we do two main things:

Call GatherContext using the name we searched for, gathering any related context of that person in the database.
Prompting the LLM. And this is a dealbreaker in the whole process. You need to correctly prompt the LLM otherwise it's going to be quite difficult getting the results you need. I'd recommend reading the prompting documentation https://platform.openai.com/docs/guides/prompt-engineering

Prompting the LLM

We're going to be using different tactics to prompt the LLM.

We'll be asking it to adopt a persona. A world-renowned detective who specializes in gathering information about famous people.
Writing clear instructions to say 'I don't know' if it doesn't know who the person provided is.
Providing reference text 'context' with the question if the database gave any results

This results in a prompt like this.

"You are a world-renowned detective who specializes in gathering information about famous people. Your task today is to gather information about some famous individuals. Extra context is provided below that MAY have the data you're looking for:"{contextString}"If you don't have information strictly say that you don't know who that is."{personName}"Who is this person?"

And here's the GPT

func GPT(text string, dbclient *structs.MySQL, openAIClient *structs.OpenAIClient) string {    contextData := GatherContext(text, dbclient, openAIClient)    contextString := strings.Join(contextData, ",")    prompt := // PROMPT SHOWN ABOVE    queryReq := openai.CompletionRequest{        Model:     openai.GPT3Dot5TurboInstruct,        Prompt:    prompt,        MaxTokens: 100,        Stop:      []string{"\n"},    }    queryResponse, err := openAIClient.Client.CreateCompletion(context.Background(), queryReq)    if err != nil {        log.Fatal("Error creating query completion:", err)    }    return queryResponse.Choices[0].Text}

We call the completion API from openAI with the prompt adding in the extra context. This should yield in a more accurate, responsive answer from openAI.

Summary

Not only does RAG augment with useful information it also provides endless opportunities and ideas using AI. In this article we used vector databases but knowledge graphs also are another way to augment with information. Because knowledge graphs have the ability to host tons of information and the relationships between them. Hope I was able to clear any confusion about RAG in this article. Till the next one!

Building a Semantic Search CLI with Vector Databases, OpenAI, and Go

Amr Elhewy — Fri, 02 Feb 2024 20:01:59 GMT

Introduction

Hello everyone! In this article I'm going to be building a CLI that enables you to:

Add new article headlines
Search for headlines related to a certain subject (semantic search)

We'll be using OpenAI's API, A free vector database & we'll write the code using Go. Let's start!

Semantic Search

What does it mean?

Semantic search is a way for computers to understand what you're really looking for when you type something into a search bar. Instead of just looking at the exact words you typed, it tries to understand the meaning behind your words.

For example, if you search for "apple", a regular search might just look for pages with the word "apple". But a semantic search will try to figure out if you're looking for the fruit, the tech company, or something else, based on other words you typed or things you've searched for in the past. It's like a smarter, more understanding search.

How is it achievable?

Here comes the role of Vector Databases, Vector databases are a special kind of database that can store and search for data in the form of vectors. Vectors are like a mathematical representation of data. In the context of semantic search, we often convert words or phrases into vectors using techniques like word embeddings.

Word embeddings

Word embeddings are a type of word representation that allows words with similar meaning to have a similar representation. They are a distributed representation for text that is perhaps one of the key breakthroughs for the impressive performance of deep learning methods on challenging natural language processing problems.

Word embeddings map words in a vocabulary to vectors of real numbers. For example, the word "king" might be represented by the vector [0.1, 0.3, -0.2, ..., 0.8] in a 300-dimensional space. These vectors are chosen so that they resemble the words' meanings. Words that are used and occur in the same contexts tend to have similar vectors. For example, "king" and "queen" might have vectors that are very close together and far away from "carrot".

These word vectors are learned by training a machine learning model on a large amount of text data. The model's goal is to predict a word given its context (the words around it), and in doing so it learns these word vectors. Hence the use of OpenAI's API in this article.

In the figure above every color represents some semantically related words for example the closer they are to each other the similar they are.

What actually happens

When you type a search query, we convert your words into vectors. We then compare these vectors with the vectors we have stored in the database. The idea is that similar words or phrases will have similar vectors. So, by finding vectors in the database that are close to your search query's vector, we can find results that are semantically similar to what you're looking for.

This way, even if the exact words in your search query don't appear in a document, the document can still be a search result if it's semantically similar to your query. This is how we achieve semantic search with vector databases.

Walkthrough

Requirements

Before moving forward make sure you have the following tools ready:

An Api Key from OpenAI with quota available. more info: https://platform.openai.com/api-keys
An account created on SingleStore which we'll be using as our vector database. This video has a walkthrough on how to setup the database for free.
Go installed on your local environment.

Make sure to read the comments in every code snippet I provide below

Integrating OpenAI with Go

First things first we'll be creating a Go struct that abstracts all the interaction with OpenAI.

Before proceeding make sure to install OpenAI's package using the following command:

go get github.com/sashabaranov/go-openai

// structs/openai.gopackage structsimport (    "context"    "log"    "github.com/sashabaranov/go-openai")// A simple struct that takes has the APIKEY(really not needed here)// aswell as the client from the openai packagetype OpenAIClient struct {    APIKey string    client *openai.Client}func NewOpenAIClient(apiKey string) *OpenAIClient {// instantiate a new struct and return it.    client := openai.NewClient(apiKey)    return &OpenAIClient{        APIKey: apiKey,        client: client,    }}func (c *OpenAIClient) GetEmbeddingForText(text string) openai.Embedding {// This method takes in a text string and calls the OPENAI API and returns its embedding    queryReq := openai.EmbeddingRequest{        Input: []string{text},        Model: openai.AdaEmbeddingV2,    }    queryResponse, err := c.client.CreateEmbeddings(context.Background(), queryReq)    if err != nil {        log.Fatal("Error creating query embedding:", err)    }    return queryResponse.Data[0]}

Connecting to SingleStore Database

After everything with OpenAI is set, we'll be creating another struct to handle integrating with the SingleStore database.

Before proceeding make sure to install the mysql driver for Go as SingleStore requires so in its documentation.

go get github.com/go-sql-driver/mysql

package structs// structs/mysql.goimport (    "database/sql"    "fmt"    "log"    _ "github.com/go-sql-driver/mysql")type MySQL struct {// struct that has the connection object    db *sql.DB}func ConnectToDatabse(username string, password string, host string, port int, database string) *MySQL {    // creating the connecting url    connection := username + ":" + password + "@tcp(" + host + ":" + fmt.Sprint(port) + ")/" + database + "?parseTime=true"    db, err := sql.Open("mysql", connection)    if err != nil {        log.Fatal("Error connecting to database:", err)    }    return &MySQL{        db: db,    }}func (mysql *MySQL) CreateDatabase() error {// We'll name our database embeddings, not the best but whatever :D    _, err := mysql.db.Exec("CREATE DATABASE IF NOT EXISTS embeddings")    return err}func (mysql *MySQL) CreateTable() (sql.Result, error) {    // we'll have a table with the same name as the database lol don't do this change it to something else    // we'll have 2 columns text (the original text) as a string and a blob as a embedding (byte array)    result, err := mysql.db.Exec("CREATE TABLE IF NOT EXISTS embeddings (text TEXT, embedding BLOB)")    return result, err}func (mysql *MySQL) InsertEmbedding(text string, embedding []byte) (sql.Result, error) {    // inserts embedding into db.    res, err := mysql.db.Exec("INSERT INTO embeddings (text, embedding) VALUES (?, ?)", text, embedding)    return res, err}func (mysql *MySQL) GetRelatedEmbeddings(embedding []byte) []string {    // performs semantic search using something called dot_product and returns the highest score amongst them    res, err := mysql.db.Query("SELECT text, dot_product(embedding, ?) as similarity FROM embeddings ORDER BY similarity DESC LIMIT 3", embedding)    if err != nil {        log.Fatal("Error querying database:", err)    }    // will return only an array of the text field.    var relatedEmbeddings []string    for res.Next() {        var text string        var similarity float32        err = res.Scan(&text, &similarity)        if err != nil {            log.Fatal()        }        relatedEmbeddings = append(relatedEmbeddings, text)    }    return relatedEmbeddings}// always close after closing the applicationfunc (mysql *MySQL) Close() {    mysql.db.Close()}

The interesting part in the code snippet above was the dot product function executed in the related embeddings query.

The dot product between two embeddings (also known as vectors) is a mathematical operation that takes two equal-length sequences of numbers and returns a single number. This operation can be used to measure the similarity between two vectors. In the context of word embeddings, the dot product can be used to measure the similarity between two words.

Putting things all together

Now that we connected to both our important parties. We'll create a package embedding that gets the embedding and inserts it into the database. Making it simpler to just send in a text and get an embedding saved into the database.

//embeddings/main.gopackage embeddingsimport (    "OPENAI-GO/embeddings/structs"    "bytes"    "encoding/binary"    "fmt"    "log")func CreateNewEmbedding(text string, dbclient *structs.MySQL, openAIClient *structs.OpenAIClient) {// Gets the embedding from OpenAI and inserts into our database    embedding := openAIClient.GetEmbeddingForText(text)    _, err := dbclient.InsertEmbedding(text, []byte(convertFloatToByte(embedding.Embedding)))    if err != nil {        fmt.Println("Error inserting embedding:", err)    }}func GetRelatedEmbeddings(text string, dbclient *structs.MySQL, openAIClient *structs.OpenAIClient) {// gets the embedding for the required text to search and performs DB semantic search    embedding := openAIClient.GetEmbeddingForText(text)    result := dbclient.GetRelatedEmbeddings(convertFloatToByte(embedding.Embedding))    fmt.Println("Search Results:")    for _, headline := range result {        fmt.Println(headline)    }    fmt.Println()}func convertFloatToByte(embedding []float32) []byte {// helper function to convert float32 array into a byte array// this is because our database takes a bytearray (blob) as embedding.    buf := new(bytes.Buffer)    for _, v := range embedding {        err := binary.Write(buf, binary.LittleEndian, v)        if err != nil {            log.Fatalf("binary.Write failed: %v", err)        }    }    return buf.Bytes()}

Creating our simple CLI

Now that we have all of this set up. The last step is to create a simple CLI that lets us add and search for Article Headlines in our case.

Before proceeding download the go package below. It's a simple wonderful package that allows to create simple CLI's.

go getgithub.com/AlecAivazis/survey/v2

package mainimport (    "OPENAI-GO/embeddings/embeddings"    "OPENAI-GO/embeddings/structs"    "fmt"    "github.com/AlecAivazis/survey/v2")func main() {    // initializing everything (database, openapi client)    dbClient := structs.ConnectToDatabse("username", "password","host", 3306, "db-name")    fmt.Println("Database Connected ")    dbClient.CreateTable()    fmt.Println("Table Created/Exists ")    defer dbClient.Close()    openAIClient := structs.NewOpenAIClient("Api-Key")    // prompting with 2 items (see options key below)    var qs = []*survey.Question{        {            Name: "name",            Prompt: &survey.Select{                Message: "What brings you today? 🤔",                Options: []string{"Add new Article Headline", "Get related headlines"},            },        },    }    // empty struct that gets filled with our answer when we pick    answers := struct {        Name string    }{}    // infinite loop that cancels when you press 'c'    for {        // prompt the user        err := survey.Ask(qs, &answers)        if err != nil {            fmt.Println(err.Error())            return        }        // loop over the possible picks and act accordingly.        switch answers.Name {        case "Add new Article Headline":            // prompt to enter a headline            var article string            prompt := &survey.Input{                Message: "Enter the article headline",            }            err := survey.AskOne(prompt, &article)            if err != nil {                fmt.Println(err.Error())                return            }            // save to database.            embeddings.CreateNewEmbedding(article, dbClient, openAIClient)            fmt.Println("Article headline added ")        case "Get related headlines":            var article string            // prompt to enter a headline to search for semantically            prompt := &survey.Input{                Message: "Enter the article headline you want to search for",            }            err := survey.AskOne(prompt, &article)            if err != nil {                fmt.Println(err.Error())                return            }            // print the response.            embeddings.GetRelatedEmbeddings(article, dbClient, openAIClient)        }        // make him press enter to continue        var cont string        prompt := &survey.Input{            Message: "Press enter to continue or c to exit",        }        err = survey.AskOne(prompt, &cont)        if err != nil {            fmt.Println(err.Error())            return        }        if cont == "c" {        // or 'c' to break & exit.            break        }    }}

If everything went smooth the final result should look like this.

On adding a new Article:

On searching for related articles:

As you can see I typed in the word "sports" which gave me the related results I inserted before (I seeded a couple of other things related to coffee & animals too).

Summary

When I first started learning about semantic search it was something really cool and I've wanted to make this article for some time now. I hope this walkthrough helped clarify what it's all about without going into deep details. The more I learn about AI related topics the more I'll post so stay tuned 🤪

Github Repo for the code: https://github.com/amrelhewy09/semantic-search-go.git

Managing AWS Lambda Functions Using Terraform

Amr Elhewy — Fri, 01 Dec 2023 18:35:22 GMT

Introduction

Hello everyone! In this article, I'm going to be walking through the basic steps to get an AWS lambda function up and running using Terraform. We will also watch for source code changes and redeploy the lambda function accordingly. Let's get started!

We'll be coding the lambda function using Go but you can really do any language you want.

Deploying AWS Lambda Using Terraform

First off let's define our Terraform providers

Defining Our Providers

# providers.tfterraform {  required_providers {    aws = {      source = "hashicorp/aws"      version = "4.55.0"    }    null = {      source = "hashicorp/null"    }    archive = {      source = "hashicorp/archive"    }    local = {      source = "hashicorp/local"    }  }  required_version = ">= 1.3.7"}provider "aws" {  access_key = "access-key"  secret_key = "secret-keyeb"  region = "eu-central-1"}

We use 4 providers:

AWS provider for terraform that allows us to provision resources on AWS (our lambda function).
Null provider which provides resources that do nothing 😂, We'll use this to watch for code changes as a trigger and execute a command that rebuilds our code. More on this provider here
The Archive provider allows us to package our compiled Go code to use as the lambda function. There are several ways to upload your code whether it's uploading them on S3 and pulling from there directly or just uploading a ZIP file containing your source code directly. The archive provider allows us to ZIP our code Anyway. More on Archive provider here
The local provider allows us to point to local files on our host. We are going to watch the changes to this file by checking its hashed value every time we do terraform apply. More on the local provider here

AWS Lambda and IAM

We're going to define a role for our lambda function. The policies for this role are anything that we'd need to access from within our lambda function

For example, if we work with S3 within our function we'll need to add policies to allow lambda to access AWS S3. Since there's really nothing we're doing we're going to be giving it access to CloudWatch so it can log different events there.

resource "aws_iam_policy" "lambda_logging" {  name        = "LambdaLogging"  path        = "/"  description = "IAM policy for logging from a lambda"  policy = <"Version": "2012-10-17",  "Statement": [    {      "Action": [        "logs:CreateLogGroup",        "logs:CreateLogStream",        "logs:PutLogEvents"      ],      "Resource": "arn:aws:logs:*:*:*",      "Effect": "Allow"    }  ]}EOF}resource "aws_iam_role" "lambda" {  name = "lambda"  assume_role_policy = <"Version": "2012-10-17",  "Statement": [    {      "Effect": "Allow",      "Principal": {        "Service": "lambda.amazonaws.com"      },      "Action": "sts:AssumeRole"    }  ]}EOF}resource "aws_iam_role_policy_attachment" "lambda_logging" {  role       = aws_iam_role.lambda.name  policy_arn = aws_iam_policy.lambda_logging.arn}

From the code above, only the Lambda service can assume this role and when it does, it only has access to "logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents"

Defining our Variables

We'll define a couple of variables that will help us when we come to the deployment steps.

# variables.tflocals {  function_name = "free-palestine"  src_path      = "${path.module}/lambda/${local.function_name}/main.go"  binary_name  = local.function_name  binary_path  = "${path.module}/tf_generated/${local.binary_name}"  archive_path = "${path.module}/tf_generated/${local.function_name}.zip"}

To understand this better, here is my directory structure.

Inside lambda/free-palestine we have a normal go module created that has a main.go file containing the lambda function source code.

// main.gopackage mainimport (    "context"    "fmt"    "github.com/aws/aws-lambda-go/lambda")func HandleRequest(ctx context.Context, event interface{}) (string, error) {    fmt.Println("event", event)    return "Free Palestine ", nil}func main() {    lambda.Start(HandleRequest)}

When we build the go binary and zip it we store these artifacts inside the tf generated directory.

Deploying the lambda function

Now all we need to do is the following:

Build the Go binary
ZIP it
Create a lambda function resource and pass the ZIP file to it.

Lets break this into 3 bits of code for more clarity.

# deployment.tfdata "local_file" "lambda_source" {  filename = "${path.module}/lambda/free-palestine/main.go"}resource "null_resource" "binary_file" {   triggers = {    source_code_hash = data.local_file.lambda_source.content_base64sha256  }  provisioner "local-exec" {    command = "GOOS=linux GOARCH=amd64 go build -o ${local.binary_path} ${local.src_path}"  }}

We used the local provisioner to invoke the local_file resource which points to our source code file that we wish to monitor for changes.

Then in the null_resource we add a trigger where the local-exec will invoke every time the source code hash changes (i.e every time we make a change to our main.go)

Then we proceed with the command that creates a binary executable file from our Go code. We specify the Operating system and architecture and specify where the source code is and the output path.

# deployment.tfdata "archive_file" "function_archive" {  type        = "zip"  source_file = local.binary_path  output_path = local.archive_path  depends_on = [null_resource.binary_file]}

After compiling our code into a binary. We proceed to ZIP it providing the source_file required to be ZIPPED.

# deployment.tfresource "aws_lambda_function" "function" {  function_name = "free-palestine"  description   = "🇵🇸🇵🇸🇵🇸🇵🇸🇵🇸🇵🇸🇵🇸🇵🇸🇵🇸🇵🇸🇵🇸🇵🇸🇵🇸🇵🇸🇵🇸🇵🇸🇵🇸"  role          = aws_iam_role.lambda.arn  handler       = local.binary_name  memory_size   = 128  filename         = local.archive_path  source_code_hash = data.archive_file.function_archive.output_base64sha256  runtime = "go1.x"}

Now we use the lambda resource, providing a function name and description, the role we created earlier, a handler that can be the same as the function name and memory size in Megabytes.

Then we give it the ZIP file and the runtime which is go in our case.

Finally (optional) we create a cloudwatch log group that stores the logs of our lambda function for tracking and debugging, etc.

resource "aws_cloudwatch_log_group" "log_group" {  name              = "/aws/lambda/${aws_lambda_function.function.function_name}"  retention_in_days = 7}

Apply the following using terraform apply --auto-approve

Now using this CLI command we can check if our lambda function works or not.

aws lambda invoke --function-name free-palestine output_file

After invoking if we cat the output file cat output_file We should see Free Palestine as the output 🎉

Cleanup

Now each time we change our main.go file and do terraform apply the function should change accordingly.

Last but not least don't forget to do terraform destroy to clean up all resources.

That's been it for this article. See you in the next one!

[Software Architecture: The Hard Parts][Chapter 6] Pulling Apart Operational Data Part 2: Decomposing Monolithic Data

Amr Elhewy — Sat, 11 Nov 2023 18:52:17 GMT

Introduction

Hello everyone! In this series of articles we're going to be going through decomposing monolithic data, decomposing monolithic databases is hard but one particularly effective technique for breaking apart data is to leverage what is known as the five-step process. This evolutionary and iterative process leverages the concept of a data domain as a vehicle for methodically migrating data into separate schemas, and consequently different physical databases.

Five Step Process

The five-step process is briefly as follows;

Analyze the database and create data domains
Assign tables to data domains and move tables
Separate database connections to data domains
Move schemas to separate database servers
Switch over to independent database servers

A data domain is a collection of coupled database artifacts- tables, views, foreign keys and triggers- that are all related to a particular domain and frequently used together within a limited functional scope.

The architect has to be able to know and understand the data domain boundaries for each domain and the cross-domain boundaries (foreign keys, views, stored procedures) that need to be broken

When extracting a data domain, these cross-domain dependencies must be removed. This means removing foreign-key constraints, views, triggers, functions and stored procedures between data domains.

When getting rid of cross-domain boundaries, each service must call the service responsible for the data instead so as not to violate the bounded context concept.

Analyze Database and Create Data Domains

The first step is to identify specific domain groupings within the database. At this point, all services should still be using one big monolithic database. This step might take some time and iteration to fully be able to identify the data domains in the database.

Assign Tables to Data Domains

The next step is to group tables along a specific bounded context, assigning tables that belong to a specific data domain into their schema. A schema is a logical construct in database servers

Schemas contain objects such as tables, views, functions and any database artifact.

When tables belonging to different data domains are tightly coupled and related to one another, data domains must necessarily be combined creating a broader bounded context where multiple services own a specific data domain.

Separate Database Connections to Data Domains

In this step, the database connection logic within each service is refactored to ensure that the services connect to a specific schema and have read and write access to the tables belonging only to their data domain.

This step is the hardest since all the cross-schema access must be resolved at the service level. Refactoring to call the service responsible for the data domain instead of accessing its database directly.

This state is known as data sovereignty per service which occurs when each service owns its data, It's considered the nirvana state for distributed architecture. It has its Benefits and shortcomings

Benefits:

Teams can change the database schema without worrying about affecting changes in other domain
Each service can use the database technology and database type best suitable for the use case

Shortcomings:

Performance issues occur when services need access to large volumes of data
Referential Integrity cannot be maintained in the database, resulting in possibly bad data quality
All database code (functions, stored procedures) that access tables belonging to other domains must be moved to the service layer.

Move Schemas to Separate Database Servers

Now what's left is to physically move the data domains to different databases. This is often necessary because while each service accesses its domain schema, it still creates a single architecture quantum since it's only one database.

When migrating you have two options; Backup and Restore or Replication

Backup and Restore

Typically you would backup the existing data of the schema and restore it on the new database, However, this would require downtime so data is synced.

Replication

We would replicate the data from the old database server to the new one, and switch over the connection afterwards. However, this approach requires some overhead to maintain the connection but no downtime is required.

Switch Over to Independent Database Servers

Once schemas are fully replicated, the service connections can be switched. This is the last step in getting the data domains and services to act as their independent deployable units.

Once the data domains have been physically separated, you can optimize the individual database servers for availability and scalability and also analyze the data to determine the most appropriate database to use. Introducing polyglot database usage in the system

Summary

Decomposing monolithic data is a complicated task that requires a lot of careful assessment and design. But with the use of the five-step process, it becomes more of a straightforward process which is still complicated but you'll always know the next task ahead.

That's it for this one hope you enjoyed it and till we meet again in the next one!

[Software Architecture: The Hard Parts][Chapter 6] Pulling Apart Operational Data Part 1: Data Decomposition Drivers

Amr Elhewy — Fri, 03 Nov 2023 19:33:22 GMT

Introduction

Hello Everyone! In this article, I'm going to be going over a continuation of the previous article (Component-based decomposition patterns). After having several coarse-grained services, eyes turn to data and questions need to be asked whether it will be required to split the data or keep using a monolithic database.

Data is always the most important asset for a company. There is a greater risk of business and application disruption when breaking apart or restructuring data.

Interestingly enough some of the techniques used to break apart application functionality can be applied to breaking apart data as well. Components translate to data domains, class files translate to data tables and coupling points between classes translate to database artifacts such as foreign keys, views, triggers or even stored procedures.

For every decomposition decision, there exist some drivers. These drivers 'drive' our decision.

Data Decomposition Drivers

Understanding and analyzing data disintegrators (drivers that justify breaking apart the data) and data integrators (drivers that justify keeping the data as is) is a very important process. Let's start with the disintegrators.

Data Disintegrators

These disintegrators provide answers and justification for the question "When should I break apart my data?". There are six main disintegrator drivers as follows;

Change Control
How many services are impacted by a single table change? (dropping tables, removing columns, etc)
Connection Management
Can the database handle the connections needed from multiple distributed services?
Scalability
Can the database scale to meet the needs of services accessing it?
Fault Tolerance
How many services are impacted by a database crash?
Architectural Quanta
Is a single database forcing me into an unwanted single architecture quantum?
Database type optimization
Can I optimize my data by using multiple database types?

Change Control

Dropping tables or columns, changing table or column names or even changing the column type might break corresponding services using those tables. These are usually called breaking changes as opposed to adding tables for example which doesn't cause a problem.

Multiple services have to be updated, tested and redeployed if several services use the same column or table. The coordination can quickly become difficult and error-prone as the number of deployed services increases. Imagine trying to coordinate 42 separately deployed services for a single-breaking database change!

The real danger is forgetting about the services that use the changed table as they will become nonoperational in production until they can be fixed.

Breaking the database into well-defined bounded contexts significantly helps control breaking database change. The bounded context concept comes from the Domain Driven Design book and describes the source code, business logic, data structures and data all bound together -encapsulated- in a specific context.

Well-formed bounded contexts around services and their corresponding data help control change because change is governed within the services of the specific context.

Most typically, bounded contexts are formed around services and the data they "own". By "own" we mean the services that make writes to this database.

The most important rule regarding bounded context is if one service in some context requires data from another service in another context. It has to request the data from the service responsible for the data, not the database directly. As this will mess up the bounded context concept. This also abstracts the database from the actual contract between both services.

Connection Management

Establishing a connection to a database is a very expensive operation, A database connection pool is used to increase performance and limit the number of concurrent connections the application is allowed to use.

In distributed services, each service has its connection pool, when multiple services share the same database, the connection to the database can become quickly saturated particularly as the number of services or service instances increases.

Reaching or exceeding the maximum number of allowed database connections is an important driver for deciding whether we break the data or not.

Frequent connection waits (the amount of time it takes for a connection to become available) is usually one of the first signs that the maximum number of connections has been reached. These can be in the form of request time-outs too.

One viable solution is to assign every service or flock of services a connection quota. Which specifies the maximum number of connections allowed for a particular service.

Usually after monitoring the needs and scalability of each service, we can assign the connections accordingly. Not all services should have the same quota it's entirely dependent on the needs of each service.

Scalability

One of the biggest advantages of distributed systems is scalability. The ability of a system to handle an increase in request volume while maintaining the same response time. Service scalability can put a tremendous strain on databases.

For a distributed system to scale, all parts of the system must scale, including the database.

Database connections, capacity, throughput and performance are all factors in determining whether a shared database can meet the scalability demands of multiple services.

By breaking apart the database, less load is put on the database increasing the scalability and performance overall. This is achieved by breaking the data into data domains as mentioned before.

Fault Tolerance

When multiple services use the same database, the system becomes less fault-tolerant due to the presence of a single point of failure which is the database.

Fault tolerance is the ability of the system to continue operating when a fault occurs (service or database fails, etc).

If fault tolerance is a main requirement for the system, breaking apart the data can help achieve that.

Architectural Quantum

An architectural quantum is nothing but an independently deployable artifact with high functional cohesion, high static coupling and synchronous dynamic coupling.

A system with a single database will always be one single architectural quantum. Because the database is needed in the functional cohesion part of the quantum definition.

Breaking apart the data can give each domain a quantum so the whole system doesn't have to be one architectural quantum.

Data Type Optimization

Not all data is treated the same, when using a monolithic database, all data must adhere to the same data types, performance, etc providing potentially sub-optimal solutions for certain types of data.

Breaking apart monolithic data allows the architect to move certain data to a more optimal database type. For example, Key-Value records that reside in a monolithic database could be moved to a key-value database for better optimization of that specific data type.

Data Integrators

Integrators do the exact opposite of disintegrators, they provide answers and justifications for the question "When should I consider putting data back together?"

The integration drivers are as follows;

Data Relationships
Are there foreign keys, triggers, and views that form close relationships between the tables?
Database transactions
Is a single transactional unit of work necessary to ensure data integrity and consistency?

Data Relationships

Like components in an architecture database tables can be coupled as well. Foreign keys, triggers, views and stored procedures tie tables together. Making it difficult to pull data apart. However, database tables in the same bounded context can have the database artifacts preserved.

The relationship between data either logical or physical is an integration driver. It creates trade-offs between integrators and disintegrators. For example, is fault tolerance more important than preserving foreign keys and relationships? In architecture analyzing trade-offs with requirements will give you the answer you need.

Database Transactions

One of the biggest advantages of having a single database is database transactions. Services that want to update multiple entities in an ACID manner (Atomicity, Consistency, Isolation and Durability) can do it easily using a single database, however when splitting or breaking apart the data into separate schemas that single transactional unit doesn't exist anymore because of the remote calls between services. That means an insert can be made in one table but not in other tables because of error conditions, resulting in data inconsistency and integrity issues.

Summary

Integrators and Disintegrators are a great way of analyzing trade-offs as an architect. They help Answer questions regarding data which always helps in making the correct decision. In the next article, We'll be talking about pulling the monolithic data apart. Thanks and till the next one.

[Software Architecture: The Hard Parts][Chapter 5] Component Based Decomposition Patterns

Amr Elhewy — Sat, 28 Oct 2023 18:47:14 GMT

Introduction

Hello everyone! In this chapter, we'll discuss component-based decomposition patterns, if you're unsure what this exactly is I'd recommend reading the article I wrote here. It talks about decomposition patterns mentioning what component-based decomposition is. In this chapter, we'll go through a technique of patterns that describe the refactoring of monolithic source code to arrive at a set of well-defined components that could eventually become services. The 5-steps are as follows:

Identify and size component pattern
Gather common domain components pattern
Flatten components pattern
Determine component dependencies pattern
Create component domain pattern
Create domain services pattern

Decomposition Patterns

We'll start by describing each pattern and giving ideas on how to maintain the pattern in the future (adding fitness functions).

Fitness functions are like tests but for software architecture. It's a set of conditions that run against your codebase and alert if any certain architecture design decisions made were violated. Usually run as a CI/CD pipeline.

Identify and Size Components Pattern

Description

Identifying components in the system and sizing them is a critical step in the process. This pattern mainly looks for components that are either too big or too small.

Components that are relatively larger than other components are usually more coupled to other components and are harder to break into separate services. Which leads to a less modular architecture.

Determining the size of components is a difficult task but one useful metric for component sizing is calculating the total number of statements within a given component. A statement is a single action performed by the source code (the ones that usually end in semi-colons in languages such as C).

While not being perfect it's a good indicator of how much the component is doing and how complex the component is.

Having a relatively consistent component size all around is very important.

Statement count per component should be listed in a table-like structure as follows;

Component name	Component namespace	Percent	Statements	Files
Billing Payment	ss.billing.payment	5	4,312	23

Component name is a descriptive name and identifier of the component that is consistent throughout application diagrams and documentation. The name should be as self-describing as possible. Feel free to change the component name if it doesn't match its function.

Component namespace is the physical identification of the component representing where the source code files implementing that component are grouped and stored.

Components are a bunch of classes, functions, and variables that have high cohesion and are interrelated to each other. They serve to solve a particular problem in the domain. For example, Billing payment handles the payment for the customer, etc.

Percent is the relative size of the component based on its percentage of the overall source code containing that component, helpful in detecting too large or too small components.

Statements are the sum of the total number of source code statements in all source files contained within that component, helpful in not only detecting the size but also the complexity of the component. A component with 12,0000 statements has a high chance of being complex.

Files are the total number of source code files that are contained within the component. This provides additional information about the component from a class structure standpoint. For example, a component with 18,000 statements and only 2 files may be an indicator to refactor into smaller more contextual classes.

When resizing a large component, two main approaches can be taken.

Functional Decomposition (Technical splitting)
Domain-Driven Approach (Domain splitting)

Sometimes components can be split into subdomains that could help create a more modular system.

Fitness Function

Maintain component inventory
This maintains the component structure as is and alerts if any new files are added or removed.
Alert if component code percentage exceeds a certain overall.
This is to maintain the overall evenly sized components.

Gather Common Domain Components Pattern

This pattern is all about identifying and collecting common domain logic and centralizing it into a single component.

Description

Shared domain functionality is part of the business processing logic of an application. Consolidating a common domain helps eliminate duplicate services when breaking apart a monolithic system. The subtle differences between the common domain functionality can be resolved within a single service or library (common code shared).

This is mostly a manual process, Usage of shared classes across components could be a good indicator that common domain functionality exists. Similar functionality scattered along the codebase will need consolidation.

Another way is by looking at the name of the logical component namespaces. For example ss.ticket.audit ss.billing.audit ss.survery.audit all contain audit-related functionality. While they may have different context, the outcome is the same (inserting a row in the audit table). This can be consolidated into ss.shared.audit resulting in less code duplication and fewer services in the resulting distributed architecture.

Fitness Functions

Find common names of leaf nodes of component namespace
This finds components with the same name, might be a good indicator that they can be consolidated into a single component.
Find common code across components.
Helps in detecting duplicated classes in different components which also can be a good indicator.

Flatten Components Pattern

Components are the building blocks of an application. They are usually defined through namespaces or packages in programming languages. However, when components are built on top of other components (nesting), they stop being components per our definition and start losing their identity.

Description

This pattern is used to ensure that components are not built on top of one another but rather flattened and represented as leaf nodes in a directory structure or namespace.

When a new directory gets created in a certain namespace or particular component, that component no longer becomes a component but rather a SUBDOMAIN

For example something like this.

Component name	Component namespace	Files
Survey	ss.survey	5
Survey Templates	ss.survey.templates	7

While this might make sense from a developer standpoint, isolating the template code from the survey processing. It creates a problem because survey templates would be considered part of the survey component. While we might be tempted to consider templates a subcomponent of the survey, issues arise trying to form services from these components. Should both components reside on a service named survey or should the survey templates be a separate service from the survey one?

This problem is resolved by defining a component as the last node of the namespace or directory structure

With this information. ss.survey.templates is a component but ss.survey will be considered a subdomain or root namespace.

The 5 class files belonging to the survey in the table above are orphaned because they don't belong to a component. Since ss.survey is a subdomain it shouldn't contain any code files.

The Flatten Components Decomposition Pattern is used to move orphaned classes to create well-defined components that exist only as leaf nodes of a directory or namespace.

The direction of flattening depends on you entirely, should the orphaned classes be moved to the template's component or move the template's source code up to the survey namespace? Maybe also split the survey orphaned code into several components instead.

Regardless of the direction of flattening, make sure source code files reside only in leaf node namespaces or directories so that source code can always be identified within a specific component.

Root namespace is a namespace node that has been extended by another namespace. For example, survey is a root namespace because it was extended by templates.

Fitness Functions

No source code should reside in a root namespace (subdomain)
Checks for any orphaned code files and alerts if any.

Determine Component Dependencies Pattern

The most asked questions when migrating from a monolithic application are how feasible it is to break it apart, the rough overall effort of the migration and whether will it require a rewrite or a refactor of the code.

Description

The purpose of this pattern is to determine and analyze the incoming (afferent) and outgoing (efferent) dependencies between components (coupling) to determine what the resulting service dependency graph might look like after breaking apart the application.

Trying to determine the granularity of a service is a very difficult process because every component potentially is a candidate that's why it's crucial to understand the dependencies between components.

Components that have minimal or no coupling are amazing candidates for splitting since they are functionally independent of other components.

When breaking apart monolithic applications, visual diagrams are very important because they act as a radar from which to determine where the enemy (high coupling) is located. Also, give an idea of what the dependency matrix will look like.

Identifying and understanding the coupling between components is essential for the success of monolithic migration. Gives the architect the feasibility estimate as well.

It also gives the architect dependency refactoring opportunities before breaking apart the application. If something could be done better why not!

Total coupling is the sum of both afferent and efferent coupling. Sometimes breaking apart a component can decrease the coupling level depending on which parts of the component code are highly used in other components.

Fitness Functions

No component shall have more than of total dependencies
This controls the overall level of coupling and notifies the architect if an attempt to exceed the number was made.
should not have a dependency on
This restricts certain components from having dependencies on each other

Create Component Domains Pattern

While each component can be identified as a separate service, the relationship between services and components is a one-to-many relationship. A service granularity can be fine, or coarse depending on its functionality or domain.

Description

The purpose of this pattern is to logically group components so that more coarse-grained domain services can be created when breaking up an application.

This pattern is a very efficient way of determining what will eventually become a service in a distributed architecture.

Component domains are physically manifested in application source code through the use of namespaces, because namespaces are hierarchical they become a great way to represent the domains and subdomains of functionality.

Coarse-grained services are usually a great stepping stone when migrating to microservices, let's look at the table below;

Component	Namespace
Billing Payment	ss.billing.payment
Billing History	ss.billing.history
Customer Profile	ss.customer.profile
Support Contact	ss.supportcontact

As you may notice they are all customer-related functionality, Their namespaces however don't reflect this. So if we group them under a customer namespace consolidating these components into a single coarse-grained domain that potentially can be made into fine-grained services in the future is a great start.

Component	Namespace
Billing Payment	ss.customer.billing.payment
Billing History	ss.customer.billing.history
Customer Profile	ss.customer.profile
Support Contact	ss.customer.supportcontact

Fitness Functions

All namespaces under should be restricted to
This prevents additional domains from being inadvertently created by development teams and alerts the architect if any new namespaces are created outside of the approved list of domains.

Create Domain Services Pattern

Once components have been sized, flattened and grouped into domains, the domains can then be moved into separately deployed domain services.

Domain services are coarse-grained, separately deployed units of software containing all of the functionality for a particular domain.

Description

In the simplest form, Service-based architecture consists of a user interface that remotely accesses coarse-grained domain services all sharing a single monolithic database

This step also helps the architect understand each domain separately and decide whether to break the domain further into finer-grained services.

Starting as fine-grained services is a trap because it could cause unneeded distributed workflows, and distributed transactions which are potential headaches that are avoided when possible.

The architect should make sure all components are identified and refactored before moving forward with this step. This helps in preventing future potential headaches in editing/moving different components.

Fitness Function

All components in should start with the same namespace
Easier readability and understanding of grouped domain components.

Summary

The described decomposition patterns provide a structured, controlled and incremental approach for breaking apart monolithic architectures. Stay tuned for the next chapter which goes hand in hand with this one, data decomposition!

That's been it for this article. Till the next one! 😊

[Software Architecture: The Hard Parts][Chapter 2] Architectural Modularity

Amr Elhewy — Sat, 14 Oct 2023 17:00:12 GMT

Introduction

This chapter focuses mainly on architectural modularity. As businesses evolve and face the torrent of evolution, the market pace also increases drastically. Technical environments also undergo rapid and fast-paced changes and architecturally, it's very difficult to keep up with these kinds of changes.

Naturally monolithic applications don't provide the level of scalability, agility and extensibility required to support for example acquisitions and merges. This brings us to an important part that states that capacity for single machines (CPU, memory) fills up very quickly. Acquisition and consumer demands keep increasing the size of monolithic applications to a point where a single instance might need a very expensive piece of hardware to run on. Take the example below where on the rightmost side is a monolithic application trying to keep up with market demands. As it grows spawning extra instances will spawn full cups as well making it very difficult to move forward.

Whereas if we broke the full glass of water into 2 (or more) parts, half the water can be poured elsewhere which can give more capacity and scalability.

Not only scalability but splitting monolithic applications gives us more agility which is the speed-to-arrive to market and respond quickly to change. One of the most important ways to survive in today's market is to have high agility. In the next section we will discuss the modularity drivers or in other words, why do we need to break down a large monolithic system into smaller parts?

Modularity Drivers

One of the primary business drivers of architectural modularity is speed-to-market as mentioned above. It can give you a competitive advantage in the market. It's usually achieved with architectural agility which is a characteristic made up of other architectural characteristics; maintainability, testability and deployability.

Competitive advantage is achieved through agility combined with scalability and fault tolerance.

The five characteristics to support agility and speed-to-market are the following:

Availability (fault tolerance)
Scalability
Deployability
Testability
Maintainability

We'll go through each one and discuss more about it in the next section. Modularity doesn't necessarily mean splitting the system into a distributed architecture. It can still be achieved using modular monolith or micro-kernel architectures. In modular monolith, components are grouped into well-formed domains and in micro-kernel components are plug-in providing better testability and deployability.

Maintainability

Maintainability is the ease of adding, changing or removing features as well as any upgrades or such.

Some of the metrics used to determine the maintainability level of applications are based on their components and are as follows:

Component coupling is the degree to which components know about one another
Component Cohesion is the degree to which the operations of a component interrelate
Cyclomatic Complexity is the overall indirection and nesting within a component
Component size is the number of aggregated statements inside a component
Technical versus domain partitioning is answering whether components are aligned by technical usage or domain partitioning

A component is a building block of an application that does some sort of business or infrastructure function. Usually, a bunch of classes and functions are in some namespace or directory in the codebase.

Large monolithic applications have weak maintainability due to the technical partitioning of the layered architecture. Tightly coupling the components and having a weak component cohesion from a domain perspective.

A simple addition or deletion of a feature requires going through each layer whether it be persistence, business or presentation and making the changes accordingly. This high coordination requirement is not the best thing. Since the feature domain is spread out over the whole architecture. Having to test all the layers and deploy them all at once risking breaking anything else.

Whereas having modular domains enables us to make a change only at the domain level mitigating different risks in other parts of the application. So a service might own a certain domain making any change only restricted to this service alone.

So as modularity increases so does maintainability.

Testability

Testability is the ease of testing as well as the completeness of testing.

It is essential for agility, Large monolithic applications have very low testability because as mentioned above a simple change requires running the whole application suite of tests before deploying which can be hundreds of tests. Having to understand why a bunch of unrelated tests failed in the middle of all this can cause headaches.

Modularity reduces the testing scope of services which allows better completeness and ease of testing. But it's not always like that because as coupling increases between services, testability can become more complex since it now depends on other services as well. An important point to take into consideration of.

Deployability

We can define deployability as the ease, frequency and overall risk of deployments. To support agility applications must support these factors. Deploying every two weeks increases the risk of deployment due to grouping lots of changes. Not only this but delays important bug fixes and changes that customers might need so it does impact them as well in some way.

Large Monolithics have low deployability due to the amount of ceremony required (code freezes, mock deployments, etc) that's why most Monolithics have a long frame between deployments. Modularity allows for less risk, and less ceremony and allows us to have more frequent deployments.

But not everything is a piece of cake, sometimes small service deployability can be a headache because as services communicate between each other and a service breaks, any other service communicating to it will have issues as well so it can cause a ripple effect causing the whole system potentially to shut down.

If your microservices must be deployed as a complete set in a specific order, please put it back into a monolith and save yourself some pain

Scalability

Scalability is the ability of a system to remain responsive under a higher-than-usual user load. A term related to it is elasticity which means the ability of a system to withstand a burst increase in user load. (high spike)

Scalability occurs over a long period while elasticity is a sudden spike.

A good example is a concert ticketing system where between major events there is a light concurrent user load however when tickets go on sale the concurrent user load significantly spikes. To maintain spikes the system must have the ability to quickly start up more instances instantly. Elasticity relies on services having a small mean time to start (MTTS) which can be achieved using fine-grained services (less code to compile, faster startup, etc)

Elasticity is concerned the most with service granularity (size of the deployment unit), whereas scalability is more a function of modularity.

Large monolithic applications are very difficult to scale because they all scale as one unit which can be very costly in cloud-based infrastructures. Not to mention the poor time to start which affects elasticity as well.

In a service-based architecture, domain services offer much better scalability but not much elasticity as they are very coarse-grained services having domain functions.

Microservices however are finer-grained in nature and offer maximum elasticity and scalability.

Availability/Fault tolerance

Fault tolerance is the ability of some parts of the system to remain responsive and available as other parts of the system fail.

Large monolithic systems offer very low levels of fault tolerance because an exception or error in one part of the code affects the whole runtime.

Modularity can boost the overall fault tolerance of the system where multiple deployments exist and failures are isolated to one deployment only, allowing the rest of the system to function normally. But if other services are synchronously dependent on this service then fault tolerance is not achieved because as mentioned before the whole system potentially can fail. This is why asynchronous communication between services is recommended for maintaining high levels of fault tolerance.

Summary

Modularity drivers are answers to whether you need to add modularity to your existing monolithic application or not. The answer is always it depends. If you lack in any or all of the drivers mentioned it's maybe time to re-think the architecture of your application! In the next chapter, we'll be talking about Component decomposition patterns which are the first steps in breaking a large monolithic and achieving modularity. Till the next one!

[Software Architecture: The Hard Parts][Chapter 3] Architectural Decomposition patterns

Amr Elhewy — Sat, 07 Oct 2023 18:47:21 GMT

Hello everyone! Today will be all about architectural decomposition patterns; what they are and the most famous patterns followed by the community. This article unravels chapter 3 from the book: Software Architecture The Hard Parts Which is an excellent read. Let's get started

What is Architectural Decomposition?

Architectural Decomposition is all about breaking large complex monolithic applications into separate services. Monolithic applications have existed for the longest time. As startups grow and more users start using their products complex issues may arise depending on how your monolithic application is set up which then forces the development teams to separate the monolithic application into services accordingly.

Knowing whether it is even feasible to break the monolithic apart or not requires a lot of time-consuming effort. However, there are two main common approaches for approaching such things; Component-based decomposition and tactical forking.

Component-based decomposition is an extraction approach that applies various refactoring patterns for refining and extracting components to form a distributed architecture in an incremental and controlled fashion.

Tactical forking is an approach that involves making replicas of the application and chipping away the unwanted parts similar to the way a sculptor creates a beautiful work of art from a block of granite.

Choosing the most effective approach depends on your setup of the large monolithic application. Is your codebase structured with component boundaries or is it just a large ball of mud?

The following decision tree asks the following questions:

if the modularity is even justified. (will splitting your monolithic app give you benefits more than headaches?) (We will discuss architectural modularity in a separate article. discussing exactly the why of decomposition here we discuss the hows)
Is the codebase decomposable? (we'll get into this shortly) (mainly concerns code coupling)
Does your code have definable components? If so then component-based decomposition might be the best approach, If it's a big ball of mud as the book mentions then tactical forking is the better approach.

Is the Codebase Decomposable?

Codebases lacking internal structure have a colloquial name -Big Ball of Mud Anti-Pattern- Without careful governance many software systems degrade into big balls of mud with no internal structure or modularity

Evaluating internal structure is a difficult task. It is important to evaluate the internal structure to be able to decide which approach is most suitable for decomposition.

There are tools that help determine some characteristics of the codebase, particularly coupling metrics which helps in evaluating internal structure.

Afferent and Efferent Coupling

Afferent coupling measures the number of incoming connections to a code artifact. In other words, other components it calls and needs. For example, a function getting invoked from another component

Efferent coupling measures the outgoing connections (dependencies) to other code artifacts which is the opposite of afferent. For example, a class invoking another component's function is efferent.

Abstractness and Instability

Abstractness is the ratio of abstract artifacts (abstract classes, interfaces) to concrete artifacts (implementation classes). It measures the ratio of abstractness versus implementation. This allows developers to understand the internal structure better. For example, a codebase with a single main() method and 10,000 lines of code would score zero in this metric and be hard to understand and split.

Instability measures the volatility of a codebase. Codebases with high instability tend to break more easily when changed because of the high coupling. A component's instability reflects how many potential changes might be forced by changes of related components.

Component-Based Decomposition

If we arrived at this decision that means that our codebase is likely structured. Before moving on we need to understand what a component is

Components are related source code files that are namespaced in a codebase. For example in the figure below, assign is a component with the namespace of ss.ticket.assign

When breaking monolithic applications into distributed architectures, build services from components not individual classes.

There are Component decomposition patterns which will be discussed in a separate article, which explain exactly how to follow component decomposition step by step.

Component-based decomposition will eventually help you arrive at the service-based architecture which in brief is a hybrid of microservices architecture style where the application is broken into domain services which are services that contain all the business logic for a particular domain.

Moving to the service-based architecture as a final target or a stepping stone to microservices is recommended as it allows us to identify the domains that require further levels of granularity.

There are two terms that we describe services with; Coarse-grained and Fine-grained services

Granularity is the extent to which a system is broken down into small parts, either the system itself or its description or observation. It is the extent to which a larger entity is subdivided. For example, a yard broken into inches has finer granularity than a yard broken into feet.

Coarse-grained systems consist of fewer, larger components than fine-grained systems; a coarse-grained description of a system regards large subcomponents while a fine-grained description regards smaller components of which the larger ones are composed.

Service-based architecture doesn't require the database to be broken apart. Which allows us to focus on domain and functional partitioning.

Tactical forking

This is the approach used when our codebase is largely unstructured. All we do in this pattern is fork the code and remove any unwanted code.

Deleting code that isn't needed is a lot easier than going through the effort of code extraction.

Each team takes a copy of the codebase and starts deleting unwanted code. In tightly coupled codebases this approach is much easier than extraction. Once the functionality has been isolated just delete the code that doesn't break anything.

Trade-offs

Tactical forking has many advantages:

Developers find it a lot easier to delete code rather than extract it. Extracting code from a chaotic codebase presents difficulties because of high coupling.
Teams can start working right away without any front-up analysis

As well as disadvantages:

The resulting services will still contain a large amount of mostly latent code from the monolith
Unless developers take additional efforts the code inside the new service won't be any better than the chaotic monolith. It's just split.
Naming inconsistencies between shared component files, which may cause difficulties in understanding the common code and keeping it consistent.

Summary

This chapter discusses different approaches in architectural decomposition of large monolith codebases that have been given the green light of architectural modularity.

In other words, they needed the benefits of splitting up the codebase into separate services. The article goes over the hows of doing so

Stay tuned for the Architectural Modularity and Component Decomposition Patterns articles up next 😉

Navigating the Jungle of Distributed Systems: A Guide to ZooKeeper and Leader Election Algorithms

Amr Elhewy — Mon, 18 Sep 2023 21:24:27 GMT

Hello everyone! today's article is going to be going deep into Apache Zookeeper which is a popular open-source coordination service used mainly in distributed systems. We're going to be going through what it is, some of its use cases and a walkthrough example using Golang electing leaders and achieving consensus on a simple level.

What is Zookeeper?

~~It's~~ simply an open-source coordination and synchronization service for distributed applications, It was originally developed by Yahoo! and is now maintained as part of the Apache Software Foundation. ZooKeeper provides a reliable and highly available way for distributed applications to synchronize and coordinate tasks in a distributed environment.

Zookeeper is based on a tree-like structure composed of nodes, called Znodes and it resembles a file system in many ways.

Znodes are the fundamental building blocks of the ZooKeeper data model. Each Znode can have associated data and metadata as well as children. These Znodes can represent various aspects of a distributed system, such as configuration settings, status information, locks, leader election, and more (we'll get into them later).

Each Znode is identified by a unique path within the hierarchy. Paths are similar to file system paths and consist of forward slashes ("/") to separate nodes. For example, "/app/config" represents a Znode named "config" located under the "app" node.

Znodes can store relatively small amounts of data (typically limited to a few kilobytes). This data is associated with the Znode and can represent configuration values, status information, or any other relevant data for a distributed application.

Zookeeper use cases

Zookeeper can be used in multiple cases including the following;

Configuration management Managing application configuration that can be shared across servers in a cluster. The idea is to maintain any configuration in a centralized place so that all servers will see any change in configuration files/data.
Leader election Electing a leader in a multi-node cluster. You might need a leader to maintain a single point for an update request.
Locks in distributed systems distributed locks enable different systems to operate on a shared resource in a mutually exclusive way. Before updating the shared resource, each server will acquire a lock and release it after the update.
Manage cluster membership Maintain and detect if any server leaves or joins a cluster and store other complex information of a cluster.

In this article, we'll go through code examples of a leader election (after I explain the algorithms) and a Manage cluster membership.

Zookeeper crash course

Before moving on to the fun stuff, let's give a quick introduction to zookeeper API, etc.

Zookeeper has different types of Znodes; ephermal, persistent, ephermal sequential and persistent sequential

Ephermal nodes live for as long as the session of the client that created it is still alive. For example if clientA decides to create an ephermal Znode and disconnected, zookeeper automatically deletes the created Znode.

Persistent Znodes always persist no matter what happens to the client. So even if the client disconnected the Znode will not get deleted.

By default creating nodes using zookeeper API creates persistent Znodes, All we need to do is pass the ephermal flag to create an ephermal one.

Ephermal Sequential nodes are ephermal Znodes but attached to it is a sequence number that increments every time a new sibling Znode gets created, so if any new sibling Znode of the same type is created, it will be assigned a number higher than the previous one.

Persistent Sequential are persistent Znodes with a sequential number attached as a suffix.

Clients can set a watch on Znodes and get notified if any changes occur, These changes could be a change in Znode data, a change in any of Znodes children, a new child Znode creation or if any child Znode is deleted under the Znode on which watch is set. This is very important in notifying applications that a change in the cluster occurred for it to act accordingly.

Zookeeper recipes refer to common design patterns or usage patterns that have emerged over time for solving specific distributed coordination and synchronization problems. These recipes provide guidelines and best practices for leveraging ZooKeeper's features to implement various distributed system components. These patterns help developers create reliable and scalable distributed applications.

Some of the common recipes include Leader Election, Distributed Locks, Configuration Management, Service Discovery and a lot more.

Some of the most used Zookeeper operations:

Leader Election in Zookeeper

Leader election is a critical concept in distributed systems and plays a pivotal role in ensuring coordination and reliability. In a leader election, a group of nodes or processes in a distributed system selects one of their members as the leader. The leader is responsible for making decisions, coordinating tasks, and maintaining the overall system's integrity. The other nodes in the group follow the leader's instructions and rely on them for guidance. Once that leader fails an election happens electing the new leader and so on. In our code example, we'll be doing writes only through the leader node, not the followers, Just like the single leader replication where the leader will be getting writes and replicating them to its followers.

There are several algorithms used in Zookeeper for leader election below are 3 of the most popular ones;

Write contention using ephermal Znodes
When a node boots up it creates a ephermal Znode for example /election/leader with its domain and port as Znode data if creating the Znode succeeds this node will be considered the leader, when another node boots up and attempts to create the same Znode it will get a node that already exists error.
All nodes will watch for changes in the /election Znode for any children addition or deletion.
If the leader fails for any reason a notification will be sent out to all watching nodes and therefore they will all try to write an ephermal Znode with the name /election/leader and the winning write is considered the leader.
To get the leader in the rest of the nodes we list the children of the /election Znode and get the data of the leader Znode which is the domain and port
The disadvantage of this approach is that once a leader goes down and notifications are sent to all follower nodes, each node tries to write the same Znode which causes performance issues as well as what is called a herd effect where nodes are just trying to write to Zookeeper which can cause huge performance issues
According to Wikipedia, the thundering herd problem occurs when a large number of processes or threads waiting for an event are awoken when that event occurs, but only one process can handle the event. When the processes wake up, they will each try to handle the event, but only one will win. All processes will compete for resources, possibly freezing the computer, until the herd is calmed down again.
Using Sequential Ephermal Znodes
This approach is all about Sequential Ephermal Znodes, Every node registers to Zookeeper under /election in a sequential manner, for example, if we have 3 nodes that register we will get the following: /election/leader-00001, /election/leader-00002, /election/leader-00003
These children are sorted and the smallest one is considered the leader. Meaning if leader-00001 fails leader-00002 is the new leader and so on.
All nodes watch for changes under /election and a notification gets sent out upon Znode addition or deletion. Servers fetch the leader by getting the children of '/election', sorting them and picking the smallest one
No unnecessary write requests are being made here we only fetch the children once a node addition or deletion occurs.
Use Sequential Ephermal but notify one server only
This approach is mainly for when you want to notify the server that became the leader only and not notify the followers. Meaning that the followers don't need to know who the leader is. This might happen if the leader server performs leader-specific tasks only.
Same process as above we create Sequential Ephermal Znodes in /election and the one with the least sequence will be considered the leader
The difference is we won't watch for /election node changes. Every node will listen only to the Znode sequentially before it.
Meaning '/election/leader-00002' node will watch for changes in /election/leader-00001. If leader-00001 got deleted only the next leader which would be leader-00002 would get notified and act accordingly.
Leader-00001 would have no watch set since there is nothing before it
This approach would only bother the server or node (same thing in this context)
that is going to become the new leader. It's useful when you don't want the followers to know who the leader currently is.

Distributed Locks in Zookeeper

Distributed locks are a way for multiple node coordination when changing a certain piece of data. This is to prevent possible data corruption and race conditions when mutually changing some piece of data. Data could be from a database for example.

A server will acquire the lock, update the data and release the lock for other servers to acquire. It's an exclusive lock mechanism (mutex kind of).

The mechanisms applied for leader election can be the same in distributed locks. Instead of /election we can have /lock and the only difference is after we finish the data update we need to delete the Znode created to free the lock.

This approach benefits from the use of Ephermal Znodes and is better than for example storing the key in Redis because when the node fails the Ephermal Znode will get automatically deleted freeing the lock and preventing any deadlock scenario which is a safer approach.

Group/Cluster membership

Maintaining group/cluster information is very easy using Zookeeper. We can watch the live nodes that are currently operating using Ephermal Znodes, watch all the nodes that are in the cluster anyway using Persistent Znodes and monitor for changes using watch and act accordingly.

We can have a Znode parent of /all_nodes and /live_nodes for all the nodes and the live ones specifically.

Group membership and leader election coding example

In this simple codebase using Golang. We aim to do the following:

Create a web server that connects to Zookeeper on server boot
The server gets registered as a live node and ads its election Znode as discussed above
I will spin up multiple instances of this server where each server should know who the current leader is and all the live nodes operating
We will have a simple POST API that creates a Person object
POST operations only go through the leader
The leader then replicates the Person to all its followers

Forgot to mention that when a server boots up it will sync its data from the leader node. The data will be kept in memory as it's just a simple example use case. This is the Git repo if anyone is interested

This is the main.go file. I'm using gin as a package for a quick, easy web server.

package mainimport (    "flag"    "fmt"    "log"    "net/http"    "time"    "zookeeper/people"    "zookeeper/zookeeper_lib"    "github.com/gin-gonic/gin"    "github.com/go-zookeeper/zk")func main() {    // instantiate a list of people empty    peopleList := []people.Person{}    r := gin.Default()    // when we run go run main.go we do -domain so we can run multiple instances    // ex: go run main.go -domain localhost:8000    domain := flag.String("domain", "", "")    flag.Parse()    // connect to my local zookeeper    conn, _, err := zk.Connect([]string{"127.0.0.1:2181"}, time.Second*2, zk.WithLogInfo(true))    if err != nil {        panic(err)    }    defer conn.Close()    //instantiate a new zkClient struct (we'll get to it)    zkClient := zookeeper_lib.NewZK(*domain, conn)    //register the Znode    zkClient.RegisterZNode()    // Use 2 go routines for watching live and election Znodes    go zkClient.WatchForLiveNodes()    go zkClient.WatchForElectionNodes()    // Sync any missing data from the leader    syncs := zkClient.SyncFromLeader()    peopleList = append(peopleList, syncs...)    log.Println("connected to zookeeper!")    // endpoint to fetch the people list    r.GET("/people", func(c *gin.Context) {        c.JSON(http.StatusOK, gin.H{"data": peopleList})    })    // endpoint to create the person    r.POST("/people", func(c *gin.Context) {        // parse the request body to extract the person attributes        var person people.Person        if err := c.ShouldBindJSON(&person); err != nil {            c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})            return        }        // check if request is from leader from Request-From header        requestOrigin := c.GetHeader("Request-From")        if requestOrigin == "leader" {            // This server is a follower, just add the person to memory and return            peopleList = append(peopleList, person)            c.Status(http.StatusOK)            return        }        // if he is a leader then add to memory + send to followers        if zkClient.IsLeader() {            peopleList = append(peopleList, person)            go zkClient.SendToAllNodes(person)            c.Status(http.StatusOK)            return        } else {        // else the request came TO a follower, redirect it to a leader            fmt.Println("not leader, redirecting to leader")            leader := zkClient.Getleader()            zookeeper_lib.SendHTTPPostRequest(leader, person, true)            c.Status(http.StatusPermanentRedirect)            return        }    })    r.Run(":" + *port)}

I explained using comments line by line as i found that it's easier to understand. Since most of the code are just method invocations we'll get to the ZKClient struct and explain everything in it.

The schema for the ZKClient struct is as follows;

const ALL_NODE_PATH = "/all"const LIVE_NODE_PATH = "/live"const ELECTION_NODE_PATH = "/election"type ZkClient struct {    domain string    conn   *zk.Conn    leader string}// Simply takes domain and connection and creates a zkClient structfunc NewZK(domain string, conn *zk.Conn) *ZkClient {    return &ZkClient{        domain: domain,        conn:   conn    }}

Now for the first method RegisterZNode

func (c *ZkClient) RegisterZNode() {    // create a PERSISTENT ZNODE in the ALL NODE PATH /all with domain as data    _, err := c.conn.Create(ALL_NODE_PATH+"/"+c.domain, []byte(c.domain), 0, zk.WorldACL(zk.PermAll))    if err != nil && err != zk.ErrNodeExists {        panic(err)    }    // create live EPHERMAL ZNODE /live with domain as data    _, err = c.conn.Create(LIVE_NODE_PATH+"/"+c.domain, []byte(c.domain), zk.FlagEphemeral, zk.WorldACL(zk.PermAll))    if err != nil {        panic(err)    }    // create EPHERMAL SEQUENTIAL ELECTION ZNODE IN /election with domain as data    _, err = c.conn.Create(ELECTION_NODE_PATH+"/leader", []byte(c.domain), zk.FlagEphemeral|zk.FlagSequence, zk.WorldACL(zk.PermAll))    if err != nil {        panic(err)    }    fmt.Println("registered znode with domain name: ", c.domain)}

func (c *ZkClient) WatchForLiveNodes() {    // watch for live nodes in the /live zNode    // this watches the /live zNode for any children changes (or itself)    liveNodes, _, liveNodeEvents, err := c.conn.ChildrenW(LIVE_NODE_PATH)    if err != nil {        panic(err)    }    for {        select {        // live node events is a go channel this switch case blocks until data comes to the channel        case event := <-liveNodeEvents:            if event.Type == zk.EventNodeChildrenChanged {                // trigger watch again after event.                liveNodes, _, liveNodeEvents, err = c.conn.ChildrenW(LIVE_NODE_PATH)                if err != nil {                    panic(err)                }                // i don't do anything with watching live nodes i just print them                fmt.Println("live nodes: ", liveNodes)            }        }    }}

func (c *ZkClient) WatchForElectionNodes() {    // watch for the election zNode for leader election    electionNodes, _, electionNodeEvents, err := c.conn.ChildrenW(ELECTION_NODE_PATH)    if err != nil {        panic(err)    }    if electionNodes == nil {        panic("election nodes is nil")    }    fmt.Println("election nodes: ", electionNodes)    // a function that elects the new leader by getting all the children    // then it sorts them and takes the smallest one as the new leader    c.leader = electNewLeader(electionNodes, c)    for {        select {        case event := <-electionNodeEvents:            if event.Type == zk.EventNodeChildrenChanged {                electionNodes, _, electionNodeEvents, err = c.conn.ChildrenW(ELECTION_NODE_PATH)                if err != nil {                    panic(err)                }                // re call the electNewLeader with the new children.                c.leader = electNewLeader(electionNodes, c)            }        }    }}// elects new leaderfunc electNewLeader(electionNodes []string, c *ZkClient) string {    // sorts children in /election    sort.Strings(electionNodes)    // picks first one    leader := electionNodes[0]    // gets its data (which is the server domain in our case)    data, _, _ := c.conn.Get(ELECTION_NODE_PATH + "/" + leader)    // prints and returns it    fmt.Println("leader is: ", string(data))    return string(data)}

func (c *ZkClient) SyncFromLeader() []people.Person {    // fetch current leader from zookeeper (returns leader domain)    leader := c.getLeaderFromChildren()     if leader == "" || leader == c.domain {    // if no leader or current is leader skip        return []people.Person{}    }    // send get request to leader    url := "http://" + leader + "/people"    // build request    req, err := http.NewRequest("GET", url, nil)    if err != nil {        fmt.Println(err)    }    client := &http.Client{}    resp, err := client.Do(req)    if err != nil {        fmt.Println(err)    }    defer resp.Body.Close()    // parse response data and save.    respData := make(map[string][]people.Person)    err = json.NewDecoder(resp.Body).Decode(&respData)    if err != nil {        fmt.Println(err)    }    peopleList := respData["data"]    fmt.Println("synced from leader: ", peopleList)    return peopleList}// getLeaderFromChildren is the same logic as electNewLeader

// This sends the person object to all the follower nodesfunc (c *ZkClient) SendToAllNodes(person people.Person) {    // gets the live nodes    allNodes, _, err := c.conn.Children(LIVE_NODE_PATH)    if err != nil {        panic(err)    }    // loop and send only to the follower ones(skip the leader node)    for _, node := range allNodes {        if node != c.domain {            fmt.Println("sending to node: ", node)            if !SendHTTPPostRequest(node, person, false) {                fmt.Println("failed to send to node: ", node)            }        }    }}// this function sends post requests to followers so they update their datafunc SendHTTPPostRequest(domain string, person people.Person, Toleader bool) bool {    // build request    url := "http://" + domain + "/people"    req, err := http.NewRequest("POST", url, person.ToJSON())    if err != nil {        fmt.Println(err)    }    if !Toleader {        // this indicates that im not sending the request to a leader        // the request is to the followers then i will add the Request-From header        // the value of the header is set to leader to indicate that it came from leader node        req.Header.Set("Request-From", "leader")    }    // build the request and send it. Only return true if it succeeded    client := &http.Client{}    resp, err := client.Do(req)    if err != nil {        fmt.Println(err)    }    defer resp.Body.Close()    return resp.StatusCode == 200}

Summary

Zookeeper for me was one of the things I had to get into sooner or later to understand what it does exactly. I wrote this article as a refresher for me in the future and to share what I came up with. I will link references below If you have any questions or any interesting facts you want to share feel free to reach me on any social media platforms. And as always till the next one!

References

Microservices Communication Made Simple: Exploring Mediator and Broker Patterns

Amr Elhewy — Sat, 09 Sep 2023 12:10:13 GMT

Hello everyone! Today I'll be talking about two event-driven approaches in microservices, I'll be explaining the differences between both along with some code written in Golang explaining the thought process. Let's start.

The event-driven architecture consists of highly decoupled, event processing services that asynchronously receive and process events.

Mediator

Mediator topology is commonly used when you need to orchestrate multiple steps within an event through a central mediator. The central mediator then orchestrates the events. Let's look at an example

Let's say we have 3 services; An order service, a payment service and an invoice service.

When a customer places an order we need to orchestrate that it first goes to the order service and some processing happens, then if succeeded proceeds to the payment service and finally the invoice service

If any step of the above fail then we won't proceed to the next service.

Now orchestrating this flow comes within the mediator service. Which is a single point of communication between these services containing all the logic on whether to proceed or not.

The mediator receives the initial event and orchestrates that event by sending events to event consumers to execute each step of the process.

Events can be executed serially (just like our example) or even in parallel depending on your architecture.

Let's look at some code written in Golang demonstrating the mediator pattern

I'll be using Kafka as the messaging pub-sub between the services in this example.

We're going to be having a multi-module workspace in Go as follows;

The Kafka directory is a module that has a generic reader and writer that can be used in any service.

The orders directory is the only one implemented (only to show the concept)

The mediator directory is the orchestrator service

Let's look at the reader in the Kafka directory

package readerimport (    "context"    "fmt"    "log"    "github.com/segmentio/kafka-go")func ReaderInit(topic string) *kafka.Reader {    r := kafka.NewReader(kafka.ReaderConfig{        Brokers: []string{"127.0.0.1:9092"},        Topic:   topic,        MaxBytes: 10e6, // 10MB    })    return r}func Read(topic string, fn func(m kafka.Message)) {    fmt.Println("Reading from topic", topic)    r := ReaderInit(topic)    for {        m, err := r.ReadMessage(context.Background())        if err != nil {            break        }        fmt.Printf("message at topic/partition/offset %v/%v/%v: %s = %s\n", m.Topic, m.Partition, m.Offset, string(m.Key), string(m.Value))        fn(m)    }    if err := r.Close(); err != nil {        log.Fatal("failed to close reader:", err)    }}

The Read function takes in a topic name and a callback, connects to the topic and will execute the callback on any message received.

In the writer Kafka module

package writerimport (    "context"    "errors"    "fmt"    "log"    "time"    "github.com/segmentio/kafka-go")func writerInit(topic string) *kafka.Writer {    w := &kafka.Writer{        Addr:                   kafka.TCP("127.0.0.1:9092"),        AllowAutoTopicCreation: true,    }    return w}func Write(message kafka.Message) {    w := writerInit(message.Topic)    var err error    const retries = 3    for i := 0; i < retries; i++ {        fmt.Println("Attempting to write to topic", message.Topic)        ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)        defer cancel()        err = w.WriteMessages(ctx, message)        if errors.Is(err, kafka.LeaderNotAvailable) || errors.Is(err, context.DeadlineExceeded) {            time.Sleep(time.Millisecond * 250)            continue        }        if err != nil {            log.Fatalf("unexpected error %v", err)        }        break    }    if err := w.Close(); err != nil {        log.Fatal("failed to close writer:", err)    }}

The write function simply takes a Kafka message and writes it to a topic.

These 2 functions will be used in all our services.

Now for the order service first;

package mainimport (    "fmt"    "kafka/reader"    "kafka/writer"    "github.com/segmentio/kafka-go")func main() {    reader.Read("orders_TOPIC", processMessage)}func processMessage(m kafka.Message) {    fmt.Println("Processing message")    // some processing    fmt.Println("Message processed")    // write the result to the topic    m.Topic = "MEDIATOR_TOPIC"    fmt.Println("Writing message to mediator-service")    writer.Write(m)    fmt.Println("Message written to mediator-service")}

I use the read function passing it a callback processMessage which simply does some processing on the order message received and writes to the mediator topic the result of the processing (whether to proceed or not)

Let's look now at the mediator service. It has a package called order_service which has logic specifically for the order service

package order_serviceimport (    "encoding/json"    "kafka/writer"    "github.com/segmentio/kafka-go")// create order structtype Order struct {    OrderID     string      `json:"order_id"`    Result      bool        `json:"result"`    OrderAmount float32     `json:"order_amount"`}func WriteToOrderTopic() {    // create order struct    order := Order{        OrderID: "123",        OrderAmount: 412.41    }    val, _ := json.Marshal(order)    // create kafka message instance    msg := kafka.Message{        Key:   []byte("order"),        Value: val,        Topic: "orders_TOPIC",    }    // write to order topic    writer.Write(msg)}

The package helps initialize the whole event flow where it can be started by the mediator or from any external service. It has a function WriteToOrderTopic where it will write the order object to the order service.

The Kafka message key is order here, we will have different keys for different message types, etc.

Now for the mediator service itself

package mainimport (    "fmt"    "kafka/reader"    "mediator/order_service"    "github.com/segmentio/kafka-go")func main() {    order_service.WriteToOrderTopic() // trigger a order event    reader.Read("MEDIATOR_TOPIC", processMessage)}func processMessage(m kafka.Message) {    fmt.Println("MEDIATOR_SERVICE:PROCESSING", string(m.Key))    // COMING FROM DIFFERENT SERVICES    // check the message key    // if key is order, write to order-service    // else if key is payment, write to payment-service, etc    key := string(m.Key)    if key == "order" {        // unmarsal the message        // if the result key is true then proceed to write to payment-service else fail        if checkIfSuccess(m.Value) {            // write to payment-service            println("write to payment-service")        } else {            // write to failure-topic            fmt.Println("write to failure-topic")        }    }}func checkIfSuccess(value []byte) bool {    // unmarshal the value    // check if the result key is true    // if true return true else return false    return true}

The mediator simply checks the key of the message incoming, parses it to check its result and makes a decision on what to do based on the result. In our case, we simply write about a topic named failure-topic but all in all the mediator is the main orchestrator for all the events flowing through our system.

For initializing the whole flow, the mediator can be a consumer on a topic that gets messages from an external service. Once the mediator consumes a message it starts the flow by sending it to the order service.

Pros

Decoupling: One of the primary benefits of the Mediator pattern is that it promotes loose coupling between microservices. Microservices communicate with each other through a central mediator, reducing direct dependencies between them. This makes it easier to add, remove, or modify microservices without affecting the entire system.
Simplified Communication: The Mediator pattern simplifies the communication flow between microservices. Instead of microservices needing to know the specifics of each other's interfaces, they only need to interact with the mediator. This can lead to cleaner and more manageable code.
Centralized Control: With a mediator in place, you can implement centralized control and coordination logic. This is especially useful for handling cross-cutting concerns like logging, security, and error handling consistently across microservices.
Flexibility: The Mediator pattern allows for flexible and dynamic interactions between microservices. You can easily add new communication channels or change the routing logic within the mediator to adapt to changing business requirements.
Testing: Testing individual microservices becomes more straightforward because you can create mock mediators for testing purposes. This isolation simplifies unit testing and reduces the need for complex integration tests.

Cons

Single Point of Failure: The mediator can become a single point of failure. If the mediator service goes down or experiences performance issues, it can disrupt communication between microservices and impact the entire system.
Complexity: While the Mediator pattern can simplify communication between microservices, it can also introduce complexity, especially in scenarios with a large number of microservices and complex routing logic within the mediator.
Performance Overhead: Depending on the implementation and the volume of messages being processed, the mediation process can introduce some performance overhead, as messages need to pass through an additional layer.
Potential Bottleneck: If not designed and scaled properly, the mediator can become a bottleneck in the system, especially in high-traffic scenarios. This requires careful planning and optimization.
Increased Latency: Because messages often pass through the mediator, there can be additional latency introduced in the communication process compared to direct communication between microservices.

Broker

The broker is nothing but a bunch of events chained together without the need for a central service.

The message flow is distributed across the event consumers in a chain-like fashion through a lightweight message broker. This topology is useful when you have a relatively simple event processing flow and you do not want (or need) central event orchestration.

As you can see every service has its logic and the logic to proceed or not is not in a central service. This flow can reduce complexity if your use case is simple and you don't want the headache of having a central service.

Every event consumer is responsible for processing an event and publishing a new event indicating the action it just performed.

I won't be showing this in the code but all there is to it is that we're going to be moving the mediator logic to each service and the service itself decides to push messages or not deciding on its processing.

Pros

Scalability: The Broker pattern can support high scalability. Message brokers are often designed to handle large volumes of messages and can distribute messages to multiple subscribers efficiently. This makes it easier to scale individual microservices independently.
Decoupling: Event consumers are single-purpose and completely decoupled from other event consumers Changes are generally isolated to one or a few event consumers and can be made quickly without impacting other components.
Ease of deployment: The broker topology tends to be easier to deploy than the mediator topology, primarily because the event mediator component is somewhat tightly coupled to the event processors: A change in an event processor component might also require a change in the event mediator, requiring both to be deployed for any given change.

Cons

Testing: While individual unit testing is not overly difficult, it does require some sort of specialized testing client or testing tool to generate events. Testing is also complicated by the asynchronous nature of this pattern.
Development: Development can be somewhat complicated due to the asynchronous nature of the pattern as well as the need for more advanced error-handling conditions within the code for unresponsive and/or failed message brokers.

Summary

Choosing between the Mediator and Broker patterns for communication is a pivotal decision. The Mediator pattern centralizes communication through a dedicated mediator component, fostering loose coupling and simplifying interaction. In contrast, the Broker pattern chains events into a message bus and relies on a message broker to facilitate asynchronous communication, enhancing scalability and reliability.

However, both patterns have potential downsides and the choice between these patterns should align with specific architectural needs, emphasizing factors like coupling, scalability, and system complexity.

References

https://medium.com/@bindubc/distributed-system-event-driven-architecture-pattern-cecaed64c3bc#:~:text=Event%20Mediator%20receives%20the%20initial,logic%20to%20process%20the%20event

Diving deep: How docker achieves container isolation using the underlying OS [Part 1]

Amr Elhewy — Fri, 25 Aug 2023 19:29:03 GMT

Hello everyone! This article is going to be a deep dive into how docker achieves complete isolation using containers. When you spin up a container that has its filesystem, network, processes, etc. what exactly happens behind the scenes?

This article will be talking about some advanced topics in the Unix operating system in general but I'll try to explain everything as clearly as possible.

Namespaces

Docker relies heavily on Linux Namespaces, It's considered one of the core things that made Docker come to life.

Linux namespaces are a mechanism that allows the kernel to provide the illusion that a set of processes have their isolated instance of a particular resource, even though they might be sharing the same underlying global resource. This illusion is achieved by creating separate namespaces for each set of processes, effectively giving them their isolated views of the resource.

There are several types of namespaces Docker uses and we'll get into them one by one. But before that, let's talk more about the concept in general

Imagine a large public library with multiple floors, each dedicated to a different subject area like science, literature, history, and so on. Within each floor, there are various sections and bookshelves containing books related to that subject. Each section is like a separate namespace.

In this analogy:

The library itself is the physical computer.
Each floor of the library is a separate namespace.
Each section on a floor is like a resource or aspect of the system being isolated (e.g., network, processes, filesystems).

Docker uses six namespaces to achieve isolation:

PID namespace for process isolation.
USER namespace for managing user permissions
MNT namespace for managing filesystem mount points.
NET namespace for managing network interfaces.
IPC namespace for managing access to IPC resources.
UTS namespace for isolating kernel and version identifiers.

We'll get into the first 3 namespaces in this article and the last 3 in the next part.

PID Namespace

This namespace is all about process isolation, When creating a new PID Namespace the processes inside it start from PID 1. The first process is called the init process (the same without namespaces). If PID 1 dies a SIGKILL is sent to all the other processes in that namespace, effectively terminating the namespace.

The kernel maintains a mapping between the PID of the process inside the namespace and the PID of the same process outside the namespace realm.

Processes inside the namespace can only see and interact with processes inside the same namespace. They are completely isolated from any other processes in the system.

Hands-on

Let's start by creating a new PID namespace.

sudo unshare -fp /bin/bashsleep 90000 &

On executing the unshare command. A new PID namespace is created. If we examine the processes and the parent of the sleep command we'll find that its parent is the /bin/bash process

ps -efUID          PID    PPID  C STIME TTY          TIME CMDroot           1       0  0 15:24 pts/0    00:00:00 /bin/bashroot          47       1  0 15:42 pts/0    00:00:00 unshare -fp /bin/bashroot          48      47  0 15:42 pts/0    00:00:00 /bin/bashroot          51      48  0 15:42 pts/0    00:00:00 sleep 90000root          52      48  0 15:42 pts/0    00:00:00 ps -ef

The sleep command has a Parent PID which is the same as the /bin/bash PID.

As we might've noticed when executing ps -ef we can see processes outside of the namespace, why is that?

All the processes in the system are tracked in a special file called procfs, this file is usually mounted in the /proc directory. This directory lists all the processes running on the system.

We'll get into mount points later on in the article but for now, we need to understand that mount points such as the /proc one, are shared across all namespaces. Any namespace newly created will see the same /proc as other namespaces do.

To only see the processes in the namespace inside the proc directory, we'll need to create a mnt namespace. Also we'll need to mount /proc specifically for this newly created namespace. The newly created mount point will not be visible to other namespaces, only visible in the newly created mnt namespace.

So the final command will be as follows:

unshare -Urpf --mount-proc

-u creates a new user namespace (we'll get into them) but they map privileges and permissions to an isolated namespace.
-f forks a new process from the unshare command where this process will act as the init process for the new PID namespace
-p creates a new PID namespace
-r command maps the outside user to the 'inside of the new user namespace user' (we'll get into this shortly)
--mount-proc creates a new mount namespace and mounts the /proc

As you might've noticed we can combine multiple namespaces where the main goal is achieving complete isolation from the outside system.

If we do ps -ef we'll get the following:

# ps -efUID          PID    PPID  C STIME TTY          TIME CMDroot           1       0  0 16:01 pts/0    00:00:00 -shroot           5       1  0 16:01 pts/0    00:00:00 ps -ef

The sh is the new forked process with a PID of 1. The ps process is a child of it with PID 5.

User Namespace

The user namespace is a way for a container (a set of isolated processes) to have a different set of permissions than the system itself. Every container inherits its permissions from the user who created the new user namespace.

For example, let's say you have a user named foo-bar. This user doesn't have root privileges but has a set of privileges assigned to him. When this user attempts to create a new user namespace. The root user of the newly created user namespace will map to foo-bar. This means he will have the same privileges as foo-bar does but his name is root in the new namespace.

The biggest advantage to the user namespace is the ability to run containers without root privileges. Additionally, depending on how you set up the UID mapping, you can completely avoid having a superuser inside a given user namespace. This means it is not possible to run any privileged processes inside of this type of namespace.

Hands-on

Let's start by creating a new user:

sudo useradd foo-barsudo passwd foo-bar

Let's create a new user namespace as follows:

su foo-barunshare -Ur

The root user of the new namespace will map to the user that invoked the command above.

cat /proc/$$/uid_map # $$ refers to currenet process idinside-ns   outside-ns  range0           1000          1

As we can see the new foo-bar user has an ID of 1000. When the new namespace was created the root user (id of 0) got mapped to the outside namespace.

MNT namespace

A mount namespace is a concept in Linux that provides an isolated view of the filesystem hierarchy to processes within a namespace. In simpler terms, it allows different processes to have their own separate "virtual" filesystem views that are distinct from each other and the main filesystem on the host system.

Each mount namespace has its own set of mounted filesystems, which can include local disk partitions, network shares, and other types of filesystems. This isolation allows processes in different namespaces to have different views of the filesystem. For example, one process might see a specific directory as its root directory, while another process might see a completely different directory as its root.

By default, if you were to create a new mount namespace with unshare -m, your view of the system would remain largely unchanged and unconfined. That's because whenever you create a new mount namespace, a copy of the mount points from the parent namespace is created in the new mount namespace. That means that any action taken on files inside a poorly configured mount namespace will impact the host.

Focus on the difference between a mount point and the mount namespace because it confused me for some time.

When creating a new mount namespace and mounting anything in there, it isn't visible from the host root.

Mounts propagate by default because of a feature in the kernel called the shared subtree. This allows every mount point to have its propagation type associated with it. This metadata determines whether new mounts under a given path are propagated to other mount points.

In simpler terms, Imagine you have a bunch of folders in your computer, like /folder1, /folder2, and so on. Each of these folders can hold different things.

Now, let's say you connect a USB drive and it appears as a new folder called /usb. If your computer is set up in a certain way, when you put files in the /usb folder, those files might also show up in other folders like /folder1 or /folder2.

This is like how information can travel between folders. The "propagation type" is like a rule that says whether changes in one folder (like adding files to /usb) should automatically affect other folders (like /folder1).

A mount state determines whether a member can receive the event. According to the same kernel documentation, there are five mount states:

shared - A mount that belongs to a peer group. Any changes that occur will propagate through all members of the peer group.
slave - One-way propagation. The master mount point will propagate events to a slave, but the master will not see any actions the slave takes.
shared and slave - Indicates that the mount point has a master, but it also has its peer group. The master will not be notified of changes to a mount point, but any peer group members downstream will.
private - Does not receive or forward any propagation events.
Unbindable - Does not receive or forward any propagation events and cannot be bind mounted.

Most container engines use private mount states when mounting a volume inside a container. Hence why you don't see any newly mounted points in a certain namespace in the root namespace.

In docker, mnt namespaces are the reason you can see a whole new filesystem inside an Ubuntu container for example. The filesystem shown inside the container wouldn't be achievable without mnt namespaces.

Hands-on

Let's mimic exactly what happens in docker. A container sees a completely different root filesystem from the host.

Let's create a new folder newroot and download the Alpine Linux filesystem.

Assuming we added the user foo-bar from above (he will be the user that maps to the root user of the new to-be-created namespace)

mkdir examplecd examplemkdir newrootwget https://dl-cdn.alpinelinux.org/alpine/v3.13/releases/x86_64/alpine-minirootfs-3.13.1-x86_64.tar.gztar xvf alpine-minirootfs-3.13.1-x86_64.tar.gz -C newrootchown foo-bar -R /example/newroot

Now foo-bar (new namespace's root user) will own the new filesystem.

Now if we start a new mount namespace

su foo-barunshare -Umr

In this namespace since we didn't specify a mount state flag. Any mount point created in this namespace is going to be a private mount state. That means it won't propagate or receive any events. It is completely isolated from any other mounts on the same directory.

That's exactly what we're going to do. We are going to create a new mount point of the Alpine filesystem just for this namespace. Then when changing the mount point for the root filesystem for the process, we'll use this one as our new mount point.

cd newrootmount --bind /example/newroot /example/newroot

This is a self bind. It binds a directory to itself. This might be confusing but I'll explain.

When you perform a self-binding mount, you are creating a new mount point that points to the same directory. This can be used to change the properties or behavior of that directory within its context. It's a bit like looking at the same directory from a different perspective. It uniquely controls access to it.

When you perform a self-binding mount, you're creating a new way to access the same directory. In simpler terms, you're saying, "Let's look at this directory from a different angle, but it's still the same directory."

Imagine you have a room in a building. Now, you decide to install a mirror on one of the walls. When you look at the mirror, you're not creating a new room. It's still the same room, but you're seeing it differently.

Similarly, self-binding creates a "mirror" or alternate perspective of a directory. This can have some practical applications:

Security Isolation: You can set different permissions or attributes on the "mirrored" directory, effectively isolating it from the original directory. This might be useful for creating a more controlled environment for specific operations.
Access Control: You could have different access rules for the original directory and the bind-mounted version. For example, you might want to allow read-only access through the bind mount, even if the original directory has broader access permissions.
Sandboxing: If you're running specific applications, you might want to provide them a view of a directory that's slightly different from the actual directory, ensuring they interact with the data in a controlled manner.
Temporary Changes: You can temporarily apply different properties or settings to the bind-mounted directory without affecting the original.

After explaining everything all that's left is changing the root filesystem for the new namespace process.

This is usually done by pivot_root command which changes what the current process views as the root filesystem. I won't be getting into it here but if you need a step-by-step head over to this article.

Summary

Linux namespaces are very complex and they shine when used together to achieve things such as isolation. Diving deep into this complexity makes you appreciate products like Docker and how long we've come to understand these complex concepts. This article featured the first 3 out of 6 used namespaces in Docker. In the next article, I'll be briefly talking about the remaining namespaces. Hope you got a slight glimpse of how everything works behind the scenes!

References

Securing Your EKS Cluster: A Comprehensive Guide to Adding HTTPS SSL Encryption

Amr Elhewy — Mon, 14 Aug 2023 14:49:02 GMT

Hello again! In this short-ish article, I'm going to be going through how you can add SSL encryption to your AWS EKS cluster.

If you don't know what SSL stands for it's Secure Sockets Layer. It encrypts the information going through the internet which makes it safer to transfer data.

If you want a deep dive into how it works I recommend this YouTube video that simply explains it.

We're going to be using AWS ACM which is AWS's certificate manager to generate a certificate and let our EKS cluster know about it.

We'll also be using a domain we created in a previous article which was created using AWS Route53. We'll go step by step into how everything eventually connects.

Requirements

Make sure you have an EKS cluster up and running. I wrote an article on how to start one up using Terraform here.
Make sure you're using AWS elastic load balancer (mentioned in the article I just linked). This is because AWS Certificate Manager only integrates with AWS services.
A domain registered on AWS Route53 (you can use any other registrar really it's easier using AWS services together though)

Linking the load balancer to our Route53 domain

Firstly, Since we have our EKS cluster up and running. We can describe our Ingress as follows; (I'm using the same conf as the EKS article here

kubectl get ingress -n temp-calculator

In the Address section, we should see something like this k8s-tempcalc-k8salb-y-x.eu-central-1.elb.amazonaws.com

This is our load balancer address. All we'll do is create a CNAME record for it using Route53 hosted zone and records.

A DNS CNAME record is simply an alias that forwards traffic to another domain name.

In our terraform config let's create the Route53 hosted zone and attach records to it.

# route53.tf# create a route53 hosted zoneresource "aws_route53_zone" "main" {  name = "your-domain"}# attach records to itresource "aws_route53_record" "main" {  zone_id = aws_route53_zone.main.zone_id  name    = ""  type    = "CNAME"  ttl     = "300"  records = [""]}

Before applying the above configuration, we need to make sure that our Kubernetes Ingress has the domain added in the route53 record configured and ready

# ingress.yamlapiVersion: networking.k8s.io/v1kind: Ingressmetadata:  name: k8s-alb  namespace: temp-calculator  labels:    name: k8s-alb  annotations:    alb.ingress.kubernetes.io/scheme: internet-facing    alb.ingress.kubernetes.io/target-type: ipspec:  ingressClassName: alb  rules:  - host: in route53 record>    http:      paths:      - pathType: Prefix        path: "/"        backend:          service:            name: temperature-api            port:              number: 3000

Once applying the terraform configuration, There is one more step we need to do.

Configure the domain from the registrar to have the same Nameservers as the hosted zone

When you create a hosted zone, it will give you a couple of name servers which are servers that help in resolving the DNS to an IP address.

You'll need to configure the domain you purchased from the registrar (in our case Route53) to use the nameservers provided by the hosted zone.

When you set up a hosted zone in Route 53, you're telling Route 53 that you want it to manage the DNS records for a specific domain name. This is where you define the DNS records like A records (for mapping domain names to IPv4 addresses), AAAA records (for mapping domain names to IPv6 addresses), MX records (for email server configuration), and more.

From the AWS console head to Route53 > Registered Domains > and Add the hosted zone's name servers.

Wait 1-2 minutes till the domain propagates through the internet. You can check using this tool.

After doing this. We can hit our domain using HTTP and it should work.

Next, let's add HTTPS to our domain.

AWS Certificate Manager

AWS has a service that automatically manages and renews TLS Certificates for you.

All you need to do to issue a certificate is create one for a specific domain and prove that the domain is yours.

Public ACM certificates are used for securing domains that are publicly accessible on the internet. These certificates are typically used for websites, web applications, APIs, and other services that users access over the public internet.

Domain Validation (DV) Certificates: These certificates verify that you have control over the domain by sending an email to the domain owner or by checking DNS records.
Wildcard Certificates: These certificates cover a domain and all its subdomains using a wildcard character (). For example, a wildcard certificate for ".example.com" would cover "www.example.com," "api.example.com," etc.
Multi-Domain (SAN) Certificates: Also known as Subject Alternative Name (SAN) certificates, these certificates can secure multiple domain names and subdomains in a single certificate. They're useful when you want to secure different domain names under a single certificate.

resource "aws_acm_certificate" "cert" {  domain_name       = ""  validation_method = "DNS"}

In the code above we create a certificate for the domain we want and we choose the validation method to be DNS. The validation method is because AWS needs proof that you're actually the owner of the domain and not someone else. There are multiple validation methods;

Email Validation: With this method, ACM sends an email to one of the email addresses associated with the domain being validated. The email contains a validation link that the recipient must click to confirm ownership of the domain. ACM provides a list of approved email addresses (such as admin@domain.com, webmaster@domain.com, etc.) that you can use for validation.
DNS Validation: DNS validation involves creating specific DNS records in your domain's DNS configuration. ACM provides you with a unique DNS record value that you need to add as a CNAME record to your domain's DNS settings. This demonstrates your control over the domain. Once the DNS record is correctly configured, ACM can query the DNS and confirm ownership of the domain.

We'll be using DNS validation here.

Let's add the certificate DNS records to our domain

resource "aws_route53_record" "cert_validation" {  allow_overwrite = true  zone_id = aws_route53_zone.main.zone_id  name =  tolist(aws_acm_certificate.cert.domain_validation_options)[0].resource_record_name  records = [tolist(aws_acm_certificate.cert.domain_validation_options)[0].resource_record_value]  type = tolist(aws_acm_certificate.cert.domain_validation_options)[0].resource_record_type  ttl = "60"}

This code above adds a CNAME certificate record to our domain.

resource "aws_acm_certificate_validation" "cert" {  certificate_arn         = aws_acm_certificate.cert.arn  validation_record_fqdns = [aws_route53_record.cert_validation.fqdn]}

The aws_acm_certificate_validation resource is used to define the validation method and required information to validate ownership of the domain for which you've requested an ACM certificate. As mentioned earlier, ACM uses email or DNS validation methods to confirm domain ownership before issuing a certificate.

Terraform will use this information to ensure that the necessary DNS records are correctly configured for domain validation. Once the DNS records are properly set up, the ACM certificate will be validated, and once validated, it will be ready to be used for securing your resources.

When applying the above code. We'll need to edit our Ingress and make our load balancer listen to SSL port 443 so we can use HTTPS.

We add the following annotations:

alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'alb.ingress.kubernetes.io/ssl-redirect: '443'

Now any requests to port 80 will be redirected to 443 and we can make requests using HTTPS now 🎉

After finishing make sure to terraform destroy to make sure you don't get charged unwillingly.

References

Creating a working ngrok from scratch using Rust and Go [Part 3]

Amr Elhewy — Sat, 05 Aug 2023 21:30:16 GMT

Hello folks! This is the last part of our 3 part series. If you haven't read the previous parts make sure you do so first before diving in.

Last time we generated unique URLs for different clients using our CLI. This part focuses only on the handleProxyRequest function mentioned in the previous part. The part where a user HTTP requests the generated URL. How will we forward the request? The diagram below explains what's going to be happening.

The user's request goes to our Go HTTP server, handleProxyRequest gets executed which briefly passes the request over a raw TCP connection. The request then reaches the CLI running on the user that wants to expose his local website. The CLI takes this request forwards it and requests from the local server running on his machine. Then the response gets passed back the same way.

This is how handleProxyRequest works:

func handleProxyRequest(w http.ResponseWriter, r *http.Request, mapping *mapping) {    // STEP 1 PARSING THE INCOMING REQUEST************************************************************    host := r.Host    // split the domain and get the first part    // this is the subdomain    uuid := strings.Split(host, ".")[0]    // get the connection from the mapping    conn := mapping.get(uuid)    if conn == nil {        // if the connection is nil, return a 404        w.WriteHeader(404)        return    }    // read the body into byte array    buf := make([]byte, 4096)    r.Body.Read(buf)************************************************************    // STEP 2 PASSING THE REQUEST OVER A RAW TCP CONNECTION    req := []byte(r.Method + " " + r.URL.String() + " " + r.Proto + "\n\n")    for k, v := range r.Header {        req = append(req, []byte(k+": "+strings.Join(v, ",")+"\n")...)    }    req = append(req, []byte("\n")...)    _ = append(req, buf...)    // write the request to the server    conn.Write(response)************************************************************    // STEP 3 READING THE RESPONSE FROM THE TCP CONNECTION    // read the response from the server    buf = make([]byte, 4096)    n, err := conn.Read(buf)    if err != nil {        w.WriteHeader(500)        return    }    // parse the response as its an http request    response = buf[:n]    // check if this response is a http response or a raw response    if strings.Split(string(response), "\n\n")[0] == "HTTP/1.1 200 OK" {        // split the response into headers and body        splitResponse := strings.Split(string(response), "\n\n")        // // split the headers into an array        headers := strings.Split(splitResponse[1], "\n")        // // get the status line        statusLine := splitResponse[0]        // // get the status code        statusCode, _ := strconv.Atoi(strings.Split(statusLine, " ")[1])************************************************************     // STEP 4 BUILD AND WRITE THE RESPONSE BACK TO THE CLIENT        // write the status code        w.WriteHeader(statusCode)        // write the headers        for _, header := range headers {            if header == "" {                continue            }            splitHeader := strings.Split(header, ": ")            w.Header().Add(splitHeader[0], splitHeader[1])        }        w.Write([]byte(strings.Join(splitResponse[2:], "\n\n")))    } else {        // this is a raw response        w.Write(response)    }}

I split the method into 4 parts. Of course, this is by no means production code the function is too big and does several things. But I'll explain each step (steps are commented on in the code above)

Parsing the incoming request; we parse the client's request and get the headers, body and method. This is because we're going to be writing these over the raw TCP connection we created with the CLI.
Passing the request over to the CLI; Since it's a raw TCP connection we're going to bare-bone send this request. We first send the method, URL and scheme over and we separate them with a \n\n (you can use any separator it's your code really) as long as it's mutual between both the client and server it should be ok. We separate the method and URL part from the headers and we separate the headers from the body.
Reading the response from the TCP connection; We receive a response from our CLI and using the separators we defined \n\n we start parsing that response.
We then build from the previous step an HTTP response to send back to the client.

This is what handleProxyRequest does. It simply forwards the incoming request to our CLI and returns the response. Now let's take a look at what happens when our CLI gets the client's request from the raw TCP connection. (step 2 above but from the CLI's POV)

// PART 1 BUILDING THE REQUEST AND PASSING IT TO THE LOCAL SERVERlet request_path = "http://".to_string() + LOCAL_ADDRESS.deref() + ":" + local_port.deref() + request.path.unwrap();// The port was defined earlier when we executed the cli command http -p 3000 for example.let request_headers = request.headers;let request_method = Method::from_bytes(request.method.unwrap().as_bytes()).unwrap();let mut request_builder = client.request(request_method, request_path);for header in request_headers {let header_name = header.name;let header_value = header.value;let header_value = HeaderValue::from_bytes(header_value).unwrap();request_builder = request_builder.header(header_name, header_value);}let response = request_builder.send().await;

We use the package reqwest. The reqwest crate provides a convenient, higher-level HTTP Client. We're going to use it to send a request to our local server.

What we do here is simply just parsing the incoming request, building it and forwarding it to our local server. request_builder.send().await; takes care of firing the request.

Now next part takes the response from our local server and returns it over the raw TCP connection

    // STEP 2 PARSING THE LOCAL RESPONSE AND RETURNING IT BACK OVER TCP CONN    match response {        Ok(r) => {            let response_headers = r.headers();            let response_status = r.status();            let mut response = String::new();            response.push_str("HTTP/1.1 ");            response.push_str(&response_status.to_string());            response.push_str("\n\n");            for header in response_headers {                let header_name = header.0;                let header_value = header.1;                response.push_str(header_name.as_str());                response.push_str(": ");                response.push_str(header_value.to_str().unwrap());                response.push_str("\n");                println!("{}: {}", header_name.as_str(), header_value.to_str().unwrap());            }            response.push_str("\n");            let body = r.text().await.unwrap();            response.push_str(&body);            return Ok(response);        }        Err(e) => {            println!("Error: {}", e);            return Err(e);        }    }}

We do the same thing we did when sending the request in handleProxyRequest we send an array of bytes with a separator and send over the wire.

NOTE that this code doesn't cover every single case and it's very basic it just aims to show the basic idea.

Summary

All in all after 3 parts and me playing around with this. It's a very very deep and interesting concept. Not needing to configure any Router or Network settings and allowing remote users to access a local application has lots of layers to it and the layers are even more when it comes to deploying this to be production ready. I hope my simple approach gave a high level of a % of what's happening. That's it for this series of articles till the next one!

Creating a working ngrok from scratch using Rust and Go [Part 2]

Amr Elhewy — Sun, 23 Jul 2023 15:25:35 GMT

We are back again for the second part of this series. If you didn't check out part one yet make sure to do so from here.

In this part we'll cover the following:

An overview of the backend components used.
Starting up our Go server
Creating a TCP connection between the rust client and Golang server
Generating a URL using uuid for every Rust client
Downloading and installing dnsmasq which is a useful tool for a development environment to redirect DNS names using wildcards *.ngrnotok.app

Backend Design

We have 2 components in a docker-compose. Nginx is a web server to redirect different requests from and to the servers. And our Golang code consists of 2 servers running in different goroutines. You can be flexible with this I just did this because it took less time and worked as I wanted it to.

The end user requests a URL for example http://231fsa.ngrnotok.app and gets redirected to the Nginx server and then to the Go HTTP Server.

The Rust CLI user once he executes the init command he'll get to Nginx which will forward him to the TCP server where the connection will be maintained throughout his usage.

Our GoLang Server

Before diving into it. There is no specific reason why I decided to use Go here. Maybe it was because I was getting rusty from not using it that much so I decided to use it as a refresher.

Our Go server will be the server that our CLI connects to and achieves a TCP connection with. And it will also be the Server the client requests URLs from.

We will generate a new UUID for every new TCP connection we get from a CLI client and keep an In memory HashMap mapping every UUID to the TCP socket.

Before going any deeper note that this is a simple and working solution. But it doesn't cover every single corner case. It's just to understand what's going on nothing more nothing less.

//main.gofunc main() {    // Listen for incoming connections.    // create a new mapping    mapping := newMapping()    id := uuid.New()    wg := new(sync.WaitGroup)    wg.Add(2)    go spawnServer(8080, mapping, id, handleClientRequest, wg)    go spawnHttpServer(8081, wg, mapping)    // block the main thread with stdin    wg.Wait()}

In our main.go we create 2 servers as mentioned. One for the CLI client to have a TCP raw connection with. And the other is for the external client when they request a certain URL.

Let's go step by step:

newMapping is just a struct that returns an instance of an object mapping that has a Hashmap property. It will map our -to be generated- UUIDs to the TCP connection.

 type mapping struct {     // map of unique id to connection     m map[string]net.Conn } // create a new mapping func newMapping() *mapping {     return &mapping{m: make(map[string]net.Conn)} }

We instantiate a uuid instance using the Go uuid package
Create a WaitGroup that will make the main thread wait on our two goroutines (both servers) until they terminate and exit gracefully.

We invoke SpawnServer which creates the raw TCP server.

 func spawnServer(port int, mapping *mapping, id uuid.UUID, fn func(net.Conn, *mapping, string), wg *sync.WaitGroup) {     l, err := net.Listen("tcp", "0.0.0.0:" + strconv.Itoa(port))     println("Listening on port " + strconv.Itoa(port) + "...")     if err != nil {         fmt.Println("Error listening:", err.Error())         os.Exit(1)     }     // Close the listener when the application closes.     defer l.Close()     defer wg.Done()     for {         // Listen for an incoming connection.         conn, err := l.Accept()         if err != nil {             fmt.Println("Error accepting: ", err.Error())             os.Exit(1)         }         go fn(conn, mapping, id.String()) // new UUID generated     } }

Simply put. It's an infinite loop that gets blocked on Accept waiting on new connections. Once a connection is found it passes it to the callback function provided as an argument.

spawnHttpServer creates a Go native HTTP server using http package

 func spawnHttpServer(port int, wg *sync.WaitGroup, mapping *mapping) {     mux := http.NewServeMux()     mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {         handleProxyRequest(w, r, mapping) // Pass the desired value as an argument     })     http.ListenAndServe(":"+strconv.Itoa(port), mux)     wg.Done() }

It forwards the requests to the function handleProxyRequest

handleProxyRequest then will proceed to find the requested connection from the uuid provided in the URL. Request from the client CLI to communicate with its local host and fetch the required response.

Before diving into what handleClientRequest handleProxyRequest do, we'll take it back to the client side and see how the CLI connects to our TCP server.

Updating the Rust CLI client

pub struct Server {}const LOCAL_ADDRESS: &str = "localhost";impl Server {    pub async fn new(address: String, port: String, local_port: String) -> Result<(), Error> {        let lport: Arc<String> = Arc::new(local_port);        let mut stream = TcpStream::connect(format!("{}:{}", address, port)).await?;        println!("Connected to server!");        let (mut reader, mut writer) = stream.split();        writer.write_all(b"Client Hello").await?;        let mut first = true;        let client = reqwest::Client::new();        loop {            println!("Waiting for request");            let mut buffer = [0; 4096];            let n = reader.read(&mut buffer).await?;            if first {                println!("You can access the website at: {}.ngrnotok.local", String::from_utf8_lossy(&buffer[..n]));                first = false;            }            else{                // to be covered in part 3.            }            if n == 0 {                break;            }        }        Ok(())    }

        match &cli.command {            Some(Commands::Http(argss)) => {                match &argss.port {                    Some(port) => {                        async {                            match Server::new("ngrnotok.local".to_string(), "8080".to_string(), port.to_string()).await{                                Ok(_) => {                                    println!("fin!")                                }                                Err(e) => {                                    println!("Error: {}", e)                                }                            }                        }                        .await;                    }                    None => {                        println!("No port provided")                    }                }            }            None => {                println!("No command provided")            }        }    }

We create a struct Server in our Rust Client. It takes in address which is the address of our Go server, port which is the port of our Go server and local_port which was the argument we provided to the cli ./ngrnotok http -p 3000 for example our local_port here is 3000.

We then use TcpStream package and connect to our Go Server, and have an infinite loop listening for anything in the stream. Only the first time we get something from the Go server it's going to be the uuid so we print that to our CLI user. Then afterwards we only get requests to our local server from clients.

We then update our CLI part of the code (explained in the previous part) and instantiate an instance of the server.

Now you may not understand some of the code above (didn't explain the flow yet) but let's go back to our Go server and I'll explain handleClientRequest first.

Handling Client init requests

func handleClientRequest(conn net.Conn, mapping *mapping, uuid string) {    mapping.add(uuid, conn)    buf := make([]byte, 1024)    _, err := conn.Read(buf)    if err != nil {        mapping.remove(uuid)        return    }    conn.Write([]byte(uuid))}

In our Rust client. When I accept a connection I send Client Hello to the server.

When the server receives this, it generates a new UUID for the connection. And adds it to our Map. Then it returns the uuid to the Client.

This takes us back to when I said that the first bytes the client receives are always the uuid. We simply check for that on the client side and print the message You can access the website at: {}.ngrnotok.local

Now the Go server keeps track of every connection along with its associated UUID.

In part 3 we'll discuss how handleProxyRequest works and how we identify the host from the request URL and proceed.

How to locally use wildcards for domain names

When developing locally, we'll face the problem of since we generate UUIDs on the fly. We need to redirect them all to the same server. But we can't add every uuid in /etc/hosts that are not very efficient and don't mimic what happens in real life.

There is this useful tool called dnsmasq which allows us to use wildcards to redirect to the same host. Here's a link to install it.

To be continued

In the next part, we'll cover how we forward HTTP requests/responses over the raw TCP connection. If you have any questions feel free to leave them in the comments and I'll answer as soon as I can. Till the next one!

Creating a working ngrok from scratch using Rust and Go [Part 1]

Amr Elhewy — Sat, 15 Jul 2023 19:26:35 GMT

Hello everyone! today I will be starting a series of articles trying to implement a bare-bones working ngrok. If you're not familiar with ngrok it's a cross-platform application that enables developers to expose a local development server to the Internet with minimal effort.

Its usage is very easy all you do is download the executable, run it specifying the port you want to expose and it returns a URL that you can access from the web.

./ngrok http 3000 # exposing localhost 3000# repsonseSession Status                onlineAccount                       x (Plan: Free)Update                        update available (version 3.3.1, Ctrl-U to update)Version                       3.1.1Region                        Europe (eu)Latency                       -Web Interface                 http://127.0.0.1:4040Forwarding                    https://de2a-154-180-33-154.ngrok-free.app -> http://localhost:3000Connections                   ttl     opn     rt1     rt5     p50     p90                              0       0       0.00    0.00    0.00    0.00

If anyone in the world uses this URL https://de2a-154-180-33-154.ngrok-free.app, they can access the web server running on our localhost port 3000.

Every Part will have its series of steps. In this part, we'll mainly focus on creating a CLI using Rust that I can pass a port to.

To start a new Rust project cargo new ngrnotok (yes I named it ngrnotok 😅)

We'll need to add a few packages in our Cargo.toml file

[dependencies]bytes = "1.4.0"clap = { version = "4.3.11", features = ["derive"] }httparse = "1.8.0"reqwest = "0.11.18"tokio = { version = "1.29.1", features = ["full"] }tokio-util = "0.7.8"

The clap package is the CLI one that will help us create the CLI with commands.

Let's get started by creating a file in src/ called cli.rs

use clap::{Args, Parser, Subcommand};#[derive(Parser)]#[command(author, version)]#[command(about = "NGR IM NOT OK")]pub struct Cli {    #[command(subcommand)]    command: Option,}#[derive(Subcommand)]enum Commands {    Http(Config),}#[derive(Args)]struct Config {    #[arg(short = 'p', long = "port")]    port: Option,}impl Cli {    pub async fn run() {        let cli = Cli::parse();        match &cli.command {            Some(Commands::Http(config)) => {                match &config.port {                    Some(port) => {                        println!("port provided!")                    }                    None => {                        println!("No port provided")                    }                }            }            None => {                println!("No command provided")            }        }    }}

We do the following in the code above:

Create a struct Cli which has a property called command where command is an enum of different commands
We create a struct commands which only have one value for the time being http
The http command takes arguments specified in a struct called config. It only takes in the port number of the server running on our local host
We then implement run as a struct method which will execute Cli::parse(); and block until a command is typed in
Once a command gets typed in if it's http we'll check if a port is present or not and just print it.

We need to add this module created to our main.rs and instantiate a new Cli instance.

#main.rsmod cli;use commands::Cli;async fn main() {    Cli::run();}

To get this running all you need to do is cargo build which builds an executable binary. Then in target/debug execute ./ngrnotok http -p 3000 and it should print the port number.

Next, we'll move on to explaining the flow. What are we exactly planning to do? how will the data flow from one end to another? Let's take a look at the image below

Let's break the steps down:

Start ngrnotok ./ngrnotok http -p 3000 . When starting a TCP connection stream is opened between the Rust client and the Server (later part of the series)
The client will then return a unique URL where I can access the project foo-123-44f.ngrnotok.app
Then I send this URL to my friend and he types it in his browser
His request gets routed to the Server that we'll be creating in Golang.
The Server checks for the unique identifier in the URL foo-123-44f and then forwards the request over the TCP stream created to the Rust client (the blue data store in the image above keeps a key value for each unique id and tcp connection).
The Rust client receives the required request and forwards it to the local server.
The response is returned from the local server to the Rust client
Rust client returns the response to the Go server
The Server then will forward this Response back to the client.

A brilliant way of port-forwarding without actually needing to configure any port-forwarding done by ngrok!

This will be it for this article because I want anyone reading to have a general idea before diving deep.

In the Next part we'll cover the following:

Starting our Go server
Creating a TCP connection between the rust client and Golang server
Generating a URL using uuid for every Rust client
Downloading and installing dnsmasq which is a useful tool for development environment to redirect DNS names using wildcards *.ngrnotok.app

Optimizing Multi-Region Hosting with Route53 and Latency Routing: A Terraform Guide

Amr Elhewy — Tue, 04 Jul 2023 14:30:01 GMT

Hello everyone! In this article, I'm going to be doing a walkthrough on how to harness the capabilities of Route53 to achieve efficient multi-region hosting with latency routing. This guide will be provisioned completely by Terraform so let's get started.

We're going to have a simple web application using nginx setup on two different EC2 instances, one will be in Eu-central-1 and the other will be in the Us-east region. Each web application displays an HTML page that prints its region nothing more nothing less.

Then we will be creating a new domain and exploring Route53's routing-based policy to redirect users to the server that has the least latency to them.

Our AWS infrastructure will contain the following:

AWS VPC with 1 public subnet & Internet Gateway
AWS Elastic IP (static IP address)
AWS EC2 Instance with its network interface & security group

Since this infrastructure will be the same in both regions, instead of duplicating the code let's explore more in terraform and use modules this time.

Modules in Terraform allow reusing some configuration multiple times without having to duplicate code. You can create the config once and pass different variables according to what you exactly need.

To get started this is our file hierarchy

As we can see in modules we have networks and VPC module which will contain most of our infrastructure.

VPC Module

In modules/networks/vpc/main.tf we have the following (will be split into multiple code blocks below)

resource "aws_vpc" "geo-vpc" {  cidr_block = var.cidr_block  tags = {    Name = var.name  }}resource "aws_subnet" "geo-subnet" {  vpc_id = aws_vpc.geo-vpc.id  cidr_block = var.subnet_cidr_block  availability_zone = var.availability_zone  tags = {    Name = "${var.name}-subnet"  }}resource "aws_internet_gateway" "geo-igw" {  vpc_id = aws_vpc.geo-vpc.id  tags = {    Name = "${var.name}-igw"  }}resource "aws_route_table" "geo-route-table" {  vpc_id = aws_vpc.geo-vpc.id  route {    cidr_block = "0.0.0.0/0"    gateway_id = aws_internet_gateway.geo-igw.id  }  tags = {    Name = "${var.name}-route-table"  }}resource "aws_route_table_association" "geo-route-table-association" {  subnet_id = aws_subnet.geo-subnet.id  route_table_id = aws_route_table.geo-route-table.id}

So far we created a VPC with a subnet and an internet gateway. Along with creating a route table allowing the subnet to be public. All the values are provided via Variables which are located in Variables.tf (we'll see later how we use the module and pass values to it)

resource "aws_security_group" "geo-sg" {  name = "${var.name}-sg"  description = "Allow all inbound traffic"  vpc_id = aws_vpc.geo-vpc.id  ingress {    description      = "HTTP"    from_port        = 80    to_port          = 80    protocol         = "tcp"    cidr_blocks      = ["0.0.0.0/0"]  }  egress {    from_port        = 0    to_port          = 0    protocol         = "-1"    cidr_blocks      = ["0.0.0.0/0"]    ipv6_cidr_blocks = ["::/0"]  }}resource "aws_network_interface" "geo-ni" {  subnet_id = aws_subnet.geo-subnet.id  private_ips = [var.private_ip]  security_groups = [aws_security_group.geo-sg.id]  tags = {    Name = "${var.name}-ni"  }}resource "aws_eip" "geo-eip" {  vpc = true  network_interface = aws_network_interface.geo-ni.id  associate_with_private_ip = var.private_ip  depends_on = [aws_network_interface.geo-ni, aws_internet_gateway.geo-igw, aws_instance.geo-instance]}

Next we created a security group for our to be created EC2 instance, allowing access to port 80, then we create a network interface assigning the subnet and giving it a private ip address as a variable (will be our EC2 instance private IP).

Lastly we create the AWS elastic IP assigning it to our private ip and network interface.

Now all that's left is creating the EC2 instance which we'll cover next.

resource "aws_instance" "geo-instance" {  ami = var.ami  instance_type = "t2.micro"  network_interface {    device_index = 0    network_interface_id = aws_network_interface.geo-ni.id  }  user_data = <<-EOF              #!/bin/bash              sudo apt update              sudo apt install -y nginx              sudo systemctl start nginx              echo "Hello World from ${var.name}
" | sudo tee /var/www/html/index.html              EOF  tags = {    Name = "${var.name}-instance"  }}

Our EC2 instance's AMI is passed as a variable (different regions have different AMIs for the same operating system). Then we assign the network interface created earlier and we pass in a user_data block that executes a bash script to install nginx and edit the Html rendered to contain the region.

Now moving on to modules/networks/vpc/variables.tf we have the following

variable "cidr_block" {  description = "The CIDR block of the VPC"}variable "name" {  description = "The name of the VPC"}variable "subnet_cidr_block" {  description = "The CIDR block of the subnet"}variable "availability_zone" {  description = "The availability zone of the subnet"}variable "private_ip" {  description = "The private IP address of the instance"}variable "ami" {  description = "The AMI to use for the instance"}

All the variables we used in the main.tf file are defined here

Lastly, we have a file outputs.tf that will output the two public IP addresses of our EC2 instances.

modules/networks/vpc/outputs.tf

output "address" {  value = aws_eip.geo-eip.public_ip}

That concludes our VPC module. Next up is using the module and passing different values to the variables defined then after that provisioning Route53

Module usage

Before we get into module usage since we're creating VPC in two different regions we need to specify multiple providers for each region. Then when using the module pass in the provider we wish to use. So in providers.tf in your project root directory specify it like this

terraform {  required_providers {    aws = {      source  = "hashicorp/aws"      version = "~> 4.0"    }  }}provider "aws" {  access_key =   secret_key =   region = "eu-central-1"  alias = "europe"}provider "aws" {  access_key =   secret_key =   region = "us-east-1"  alias = "us"}

Then we'll use the alias in each module to choose which provider we want.

In our project root directory main.tf file we'll have the following code to provision our Module config in Frankfurt(eu-central-1)

module "vpc-frankfurt" {  source = "./modules/networks/vpc"  providers = {    aws = aws.europe  }  cidr_block = "10.0.0.0/16"  name = "frankfurt"  subnet_cidr_block = "10.0.1.0/24"  availability_zone = "eu-central-1a"  private_ip = "10.0.1.50"  ami = "ami-03f1cc6c8b9c0b899"}

In Frankfurt's VPC we specify the provider as mentioned above and the source for our module. Then we pass in all the variables the module needs as above.

Then for us-east-1

module "vpc-us-east-1" {  source = "./modules/networks/vpc"  providers = {    aws = aws.us  }  cidr_block = "10.0.0.0/16"  name="us-east-1"  subnet_cidr_block = "10.0.1.0/24"  availability_zone = "us-east-1a"  private_ip = "10.0.1.50"  ami = "ami-003d3d03cfe1b0468"}

Same concept but different variables. Not only did modules make our code much cleaner. It also saved us a lot of time in writing unnecessary boilerplate.

Provisioning Route53

Before starting you'll need to make sure that you own a domain on any platform. We'll be using this domain for our provisioning. I created a new domain on Route53 if you'd like to do the same check this link

Now in our root project's main.tf file we'll create what is called a Route53 Hosted Zone which is mainly a container for some domain we have that has DNS records inside. We can add different routing policies to these DNS records and have complete control over them.

First, we create the hosted zone

resource "aws_route53_zone" "nginx-zone" {  name = "name-of-hosted-zone"}

Then we add records to it.

resource "aws_route53_record" "usa" {  zone_id = aws_route53_zone.nginx-zone.zone_id  name = "hewitech.click"  type = "A"  ttl = "300"  records = [module.vpc-us-east-1.address]  latency_routing_policy {    region = "us-east-1"  }  set_identifier = "us-east-1"}

We specify the zone_id, along with the domain name, type and TTL.

Then we give it a records list of IP addresses since our type is A record. We have only 1 IP address in us-east-1 and to get the output address from the module we do module.vpc-us-east-1.address Since we specified the output name to be address this is how to call it.

Then we give it a latency_routing_policy block (we'll get to this in a bit) and specify that to be us-east-1 for its region.

Now for the Frankfurt record

resource "aws_route53_record" "frankfurt" {  zone_id = aws_route53_zone.hei-zone.zone_id  name = "hewitech.click"  type = "A"  ttl = "300"  records = [module.vpc-frankfurt.address]  latency_routing_policy {    region = "eu-central-1"  }  set_identifier = "eu-central-1"}

Same thing but we specify the Frankfurt IP address and a latency routing policy with region eu-central-1

Once a request comes to this domain since it has the latency routing policy it will check against the regions specified in these policies and route the one with the lowest latency.

For more on latency routing check here

Now we are done with all the infrastructure code. Let's do terraform init followed by terraform apply --auto-approve

When everything is set up we need to add AWS Nameserver records to the domain name from the registrar(where our domain resides) (Godaddy, route53, etc).

If we head to Route53 in our AWS management console and find our newly created hosted zone and check the records, we should find a couple of name servers that AWS automatically added for us. We need to copy these name servers and add them as records in our domain registrar service.

Then after about 5 minutes or so (wait till the DNS propagates) (you can check on multiple websites such as https://dnschecker.org) We should be able to hit our domain name and get routed to the lowest latency server.

For example, my nearest server will be Frankfurt so if I visit my domain I get the following

Then if I proceed to use a VPN (Hola VPN on Chrome) and select The United States I get the following

Finally, after everything is done don't forget to terraform destroy --auto-approve

That's it! Hope you guys enjoyed the small walkthrough and learned something new today. See you in the next one!

Jenkins RSpec CI; Distributing RSpec tests using Jenkins & Knapsack

Amr Elhewy — Thu, 29 Jun 2023 20:20:34 GMT

Hello everyone! today I'm going to give a walkthrough On how to distribute your RSpec tests on several Jenkins agents. If you have lots of tests that need to be run continuously in a CI pipeline whenever you push new code, distributing the tests on several independent nodes is one of the most effective ways to do so.

Luckily since we're using RSpec and most likely using Ruby and Rails, we'll make use of the knapsack gem (free version) and we'll also leverage Jenkins as a CI platform to do so. Our infrastructure will be hosted entirely on AWS and the IaC (Infrastructure as Code) will be using Terraform to deploy and provision all our EC2 instances. Let's give a simple step-by-step on what we'll be doing;

Provisioning the whole infrastructure using AWS and Terraform
Setting up Jenkins and making sure it runs correctly
Installing Jenkins agents (nodes) and making sure we have a cluster of Jenkins nodes ready to be used.
Adding Github credentials to Jenkins
Configuring our RSpec pipeline in Jenkins to do the following:
1. Pull code from the repository
2. Stashing (Jenkin's way of passing files to different agents) the code in all our agents
3. Running RSpec with knapsack distributing the specs as we wish
Summary

This article cost me a lot of time and money 😅, But I'll make sure to briefly explain everything and simplify as much as I can. Let's start with our IaC

Since this might be a long article I'll cut some things I already did many times in my previous articles. I'll make sure to reference everything through

IaC using Terraform & AWS

To get started we'll need to configure our AWS VPN adding private and public subnets. You can configure this as you wish I had one public and one private subnet, We'll provision 3 Jenkins nodes where one of them acts as a master.

The master Jenkins node needs to be on the public subnet because if not you won't be able to access the UI and do anything useful.

I'll mention all the articles where I configured a VPN from scratch and choose as you wish:

https://hewi.blog/deploying-a-simple-web-server-using-terraform-aws (VPN part only)
https://hewi.blog/deploying-an-eks-cluster-using-terraform (VPN part only)

Once you have the VPN set we'll need to provision EC2 instances which will act as our Jenkins nodes. I'm going to be provisioning 3 nodes (1 master and 2 slaves) but feel free to choose as you wish.

To provision an EC2 instance, We'll need the following first:

Network interfaces for our instances (optional)
Security Groups for our instances
Elastic IP (AWS Static IP Address) for the Jenkins master node
AWS Key pair for SSH reasons.

Below is the code for the Jenkins master Security Group

resource "aws_security_group" "jenkins-master" {  name        = "ec2 sg"  description = "the ec2 sg"  vpc_id      = aws_vpc.my-vpc.id  ingress {    description      = "ssh"    from_port        = 22    to_port          = 22    protocol         = "tcp"    cidr_blocks      = ["0.0.0.0/0"]  }  ingress {    description      = "8080"    from_port        = 8080    to_port          = 8080    protocol         = "tcp"    cidr_blocks      = ["0.0.0.0/0"]  }  egress {    from_port        = 0    to_port          = 0    protocol         = "-1"    cidr_blocks      = ["0.0.0.0/0"]    ipv6_cidr_blocks = ["::/0"]  }  tags = {    Name = "rds-sg-jnk-master"  }}

We allow ports 22 (SSH) and 8080 (Jenkins Web server) only. For the other nodes since they don't have any web server you can omit the 8080 and only have port 22 enabled. It's a must port 22 is enabled in all the nodes' otherwise we won't be able to connect them all later on.

Below is the network interface, elastic ip and keypair for the master node.

resource "aws_network_interface" "master-jenkins" {  subnet_id   = aws_subnet.public-central-1a.id  private_ips = ["10.0.2.164"]  security_groups = [ aws_security_group.jenkins-master.id ]  tags = {    Name = "ec2-ni-master"  }}resource "aws_eip" "master-ip" {  domain                    = "vpc"  network_interface         = aws_network_interface.master-jenkins.id  associate_with_private_ip = "10.0.2.164"  depends_on                = [aws_internet_gateway.gw]}resource "aws_key_pair" "deployer" {  key_name   = "ssh-key"  public_key = "ssh-keygen"}

For the network interface, we'll attach the public subnet (which should be created by you), choose an IP address from the subnet and attach the security group too.

Then for the Elastic IP, we give it the IP address and the network interface we just created.

Lastly, we need a key pair for SSHing into our master and slave EC2 instances. To do this just open up a terminal and execute ssh-keygen. This will generate a public-private key pair where the public key will reside on the EC2 instance and you'll be able to SSH to it using the private key generated.

In the aws_key_pair resource add your public key.

Now comes the EC2 pair and there's something that needs to be mentioned before moving on. Choosing the Right instance type for your EC2 instance is crucial. You need to monitor the behavior of your resources and also monitor what happens during the runtime of the tests. This will help you a lot because for example if your applications reside inside docker it's a completely different story. After all, the more containers you have the more memory you'll need. But there are a million different ways to set up your testing environment. For this one I'm just using a docker-compose file with some services but if you ask me if this is a viable real-life scenario I would honestly say no and here's why;

Docker uses a lot of resources and space which won't be cost efficient if you want to provision EC2 instances just for testing. For example, if you have a test database as a container instead of having a container you can spin up the cheapest RDS instance and have it connect to it at test runtime. But also in the end it depends on how your environment and application nature are in general so this might need some thinking from you before moving on.

For this article, I tried several EC2 instances and ended up using c7g.xlarge which has 4 cores and 8 GB RAM. But on-demand costs around 0.145$/hr (~100 $/month) but just for the sake of the article to get things running. Cost efficiency & deciding if this is worth it or not is a decision you should take.

This is the code for the EC2 instance;

resource "aws_instance" "jenkins-master" {  ami           = "ami-0510240bfdd000cbd"  instance_type = "c7g.xlarge"  network_interface {    network_interface_id = aws_network_interface.master-jenkins.id    device_index         = 0  }  tags = {    Name = "jenkins master server"  }  root_block_device {    volume_size = 10  }  key_name = aws_key_pair.deployer.key_name

It takes an AMI (Amazon Machine Image) which is the base image for the instance, the instance type, network interface and key pair we created earlier

the root block device just modifies the EBS block store (disk size in simple terms) to 10GB because the default was 8 I think.

The slave nodes are the same but just have a different network interface. Same key pair though

To get things running quickly once provisioning the node I added a script that gets executed as soon as the instance gets running. Just add the following to the EC2 instance resource block

  user_data = <<-EOF              #!/bin/bash              sudo apt update              sudo apt -y install openjdk-11-jre              curl -fsSL https://pkg.jenkins.io/debian/jenkins.io-2023.key | sudo tee /usr/share/keyrings/jenkins-keyring.asc > /dev/null              echo deb [signed-by=/usr/share/keyrings/jenkins-keyring.asc] https://pkg.jenkins.io/debian binary/ | sudo tee /etc/apt/sources.list.d/jenkins.list > /dev/null              sudo apt update              sudo apt-get -y install jenkins              curl -fsSL https://get.docker.com -o get-docker.sh              sudo sh get-docker.sh              sudo usermod -aG docker jenkins              sudo systemctl restart jenkins              EOF

This simply installs Jenkins and Docker and adds the Jenkins user (automatically created by jenkins on installation) to the Docker group. If you're not using docker just remove that part from the script.

For the slave nodes, I have a different block

  user_data = <<-EOF              #!/bin/bash              sudo apt update              sudo apt -y install openjdk-11-jre              curl -fsSL https://get.docker.com -o get-docker.sh              sudo sh get-docker.sh              sudo useradd -d /var/lib/jenkins jenkins              sudo usermod -aG docker jenkins              mkdir -p /var/lib/jenkins/.ssh              chown -R jenkins:jenkins /var/lib/jenkins              chmod 700 /var/lib/jenkins/.ssh              sudo su jenkins              cd /var/lib/jenkins/.ssh              ssh-keygen -t rsa -N "" -f /var/lib/jenkins/.ssh/id_rsa              cat /var/lib/jenkins/.ssh/id_rsa.pub >> /var/lib/jenkins/.ssh/authorized_keys              chmod 600 /var/lib/jenkins/.ssh/authorized_keys              EOF

In the slaves, we need to create the jenkins user with a home directory /var/lib/jenkins. And we need to create a public-private key pair using this jenkins user and add the public key in a file named authorized_keys but why do we do so?

Jenkins connects the nodes using SSH (It's one of many ways) and we'll be using this way to connect all the nodes with the master. Adding the public key to authorized_keys is because when I SSH to a server; that server checks if the private key sent matches one of the public keys in the authorized_keys file. The master Jenkins node sends the private key created in the slaves (we'll have to copy them some way) and the slave checks if there's a public key that matches.

Now all our nodes should have docker and java installed. Only the master node has a Jenkins server. Next, we'll start by visiting our master node's IP Address with port 8080 from our browser to redirect to the Jenkins server

Jenkins setup

I won't go through the setup as it's very straightforward, When everything is setup successfully we need to do the following:

Create our GitHub credential
Add our agents to the master node
Create the Jenkins RSpec pipeline

To create our Jenkins Github credential make sure you have a GitHub fine-grained token created (or you can use SSH if you want) and do the following:

Head to manage Jenkins
Click on credentials -> global -> Add credential
(I chose a username and password in my case). The username is your GitHub username and the password is the fine-grained token you created on GitHub
In the ID field just name it whatever you want but remember its name because we'll use it later on in our pipeline

After that head over to this amazing simple tutorial on how to add Jenkins nodes. It's well explained there's nothing extra I'll be able to add. You'll find that we did the first couple of steps in the script when starting the slave EC2 instance.

Now comes the part where we create the pipeline so in Jenkins add a new pipeline name it whatever you wish and configure your build settings as you wish. The most important thing is the script we'll be using for our pipeline so scroll to the bottom and paste the following:

Note that the code below doesn't have any cleanup that entirely depends on how your environment works. In my case, I only delete the GitHub directory. But since I'm using docker I could've deleted the images or only the images that get built continuously and kept the rest.

pipeline {    agent any    stages {        stage('Checkout') {            steps {                checkout([                    $class: 'GitSCM',                    branches: [[name: 'without-es']],                    userRemoteConfigs: [[url: '', credentialsId: '']]                ])                stash name: 'source', includes: '**/*'            }        }        stage('Build & Test') {            steps {                script{                    parallel knapsack(2){                        node {                            withCleanup{                                unstash 'source'                                sh 'cp example.env .env'                                sh 'echo "CI_NODE_INDEX=$CI_NODE_INDEX" >> .env'                                sh 'echo "CI_NODE_TOTAL=$CI_NODE_TOTAL" >> .env'                                sh 'docker compose up --build -d'                                sh 'docker compose exec web gem install bundler'                                sh 'docker compose exec web bundle install -j 4'                                sh 'docker compose exec web bundle exec rails db:migrate'                                sh 'docker compose exec web bundle exec rake knapsack:rspec'                                sh 'docker compose down'                            }                        }                    }                }            }        }    }}def withCleanup(Closure cl) {    deleteDir()    try {        cl()    } finally {        deleteDir()    }}def knapsack(ci_node_total, cl) {    def nodes = [:]    for (int i = 0; i < ci_node_total; i++) {        def index = i        nodes["ci_node_${i}"] = {            withEnv(["CI_NODE_INDEX=$index", "CI_NODE_TOTAL=$ci_node_total"]) {                cl()            }        }    }

This code mainly has 2 stages I'll explain each accordingly.

The checkout stage is where we pull code from our GitHub repository providing the repository URL and the credential we created earlier (use the Credential ID you added). Then we stash the code. Stash in Jenkins is a convenient way to save a bunch of files or directories and reuse them in different nodes in the same Pipeline run.
Next, we have the Build & Test which can be split into 2 stages but I got lazy 😅, We use a function called knapsack which takes in the number of nodes as an argument and adds two environment variables to them. These ENV variables are used by Knapsack as it uses it to divide the tests across the N nodes provided. CI_NODE_INDEX is the index of the current node and CI_NODE_TOTAL is the total number of nodes provided.
Using the parallel keyword along with node tells Jenkins to execute the commands in N different nodes in parallel.
Then we invoke the WithCleanup which is a wrapper around our code and delete the directory after we finish as a simple cleanup.
The main code adds the environmental variables to our .env file as my rails container takes the .env and uses it as environmental variables inside the container.
Then I proceed to install the bundler, and all the gems, do any needed migrations and finally execute bundle exec rake knapsack:rspec
Doing this every node assigned will execute a portion of the specs, as well as print a report for each node on the status of each run (failures, etc). You can specify which files to execute on which node too. Knapsack has a lot of capabilities I just showed you a very high level of it. If you're interested head over to their official documentation
Finally I down the containers. But as I mentioned lots of cleanups can be made here (deleting images, etc).

Summary

Splitting specs can be very useful in cases where you have hundreds even thousands of them and need to run them before every build. This was a very high-level (just get it working) approach and I'm sure there are a lot of optimizations we can do in the code above. But it was to showcase the power of splitting tests. So if you need to have a CI pipeline with some GitHub hook that triggers it automatically on every push, for example, you can easily do this with Jenkins and Knapsack.

Hope you enjoyed this one as much as I did & if you have any questions I'll be happy to help

References

Enhancing Database Performance with Sorted Storage using SSTables

Amr Elhewy — Sat, 10 Jun 2023 12:02:48 GMT

Hello everyone! today I'm going to be talking about everything related to SSTables and how popular databases such as Cassandra utilize them to achieve high data operation performance and throughput. We'll bump into different side topics here and there but I'll make sure to explain as much as I can so without further or do let's get started!

Foundation

So before diving into SStables let's talk generally about databases. The simplest form of a database is just a file storing different keys and their respective values which is also called log-structured files

To find keys in this file you can for example have an index in memory that has all the keys and their respective offsets in the file. It may look like this

Now this is very simple to implement and is as fast as it can get since it only appends to the end of the file but has different problems;

Since the whole index is in memory if a crash occurs the index is gone and you'll be required to build the index again depending on the file size this could be a tedious operation. Some databases recover from this by taking snapshots of the index and saving them to disk.
Very bad for range queries since keys are not sorted in any way.
Very space inefficient, a solution to this is to cap the size of the file and freeze it after the cap is met. Then start writing to a new file. Then in the background, a compaction and merging process is done

Since we append only, Updates are just inserts in the log file with the new value and the same key. After some time merging is done to remove any duplicate keys taking the latest value inserted for each key. The merging keeps the file count small.
A downside is we have to have an index in memory for each file created. These indices are called Hash indices.

The problems discussed now leads us to what is now called SSTables

SSTables

SS stands for Sorted-String Tables. They are segment files stored on disk where every key is in sorted order. Bare with me we'll get to how it gets sorted when storing it but since all keys are sorted. Merging becomes easier since we can use part of the mergesort algorithm to join both files and make a single sorted segment file.

Another bonus of doing this is that we don't need to store all the keys in our Hash index as we did previously. What we can do is have something called a sparse index meaning we have an index that stores some of the keys not all. So for example we store handbag and handprinted from the figure above and when searching for handicap we know it's between handbag and handprinted so we scan this area only.

This saves alot of space in memory and allows for efficient querying. As well as getting rid of the range query problem as now the keys are sorted we can easily do range queries.

Now How does the segment file get sorted in the first place since the files are append only?

This part is one of the main reasons many databases such as Cassandra have very high performance in reads and writes. Let's get to it.

Firstly when a write process happens, the data isn't directly stored on disk. It gets written in memory in a certain data structure that allows adding different values and returns them in sorted order. Data structures that do this are AVL Trees and Red Black Trees which are a type of binary search trees but they are always self-balanced which gives them an efficiency upper edge maintaining the time complexity for insertion, deletion and search of O(logn).
The data structure is called a memtable. Once the memtable exceeds a certain size it is then flushed to disk to a segment file.
In the background from time-to-time these segment files go through merging and compaction where the last value for each key is taken and values marked as deleted will be removed (tombstone operation). All this while the memtable keeps accepting new writes.

Since log-structured files are append-only. Deleting records means we mark the key we want to delete and append that to the log file so when merging occurs we know that this key got deleted.

When a read request comes in, we first check the memtable then proceed to scan every segment from recent to old. That is if we didn't use a hash index.

One problem we will face during scanning through every segment is what if we are searching for a key that doesn't even exist in the segments? that would be a major problem as we'd have to scan every file in order to return nothing which is unnecessary work.

A way to fix this is by using Bloom Filters

Bloom filters

A Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. It uses the concept of hashing to check if a key belongs to a file or not. Since it's probabilistic it can give False Positive results which means sometimes it can tell you that a key exists in this file when it doesn't. It never gives false negative results.

Bloom filter uses binary 1s and 0s to dictate wether a key belongs to a file or not. Inserts in the bloom filter work as follows:

We create an array of bits of size n (The required size of the bloom filter)
The key inserted passes through k hashing functions and then we mod the result by the size of the bloom filter. This will give us k indcies.
We proceed to set the bits of these indcies to 1 in our bloom filter.

The search process does the same thing; hashing the key and checking if the k indcies have the bit set to 1. If all of them are set to 1 then we say that the key might be in this segment. If one of the indcies has 0 then we are sure that the key is not in this segment

Summary

In this blog, we delve into the concept of SSTables (Sorted String Tables) and their significance in achieving high-performance data operations in popular databases like Cassandra. We begin by discussing the challenges faced by simple log-structured files, such as the lack of sorting, memory-intensive indexing, and space inefficiency.

SSTables offer a solution to these issues. They are segment files stored on disk where keys are sorted in ascending order, facilitating efficient merging and range queries. With SSTables, the need for an in-memory hash index for each file is eliminated, replaced by a sparse index that stores only a subset of keys, optimizing memory usage and query efficiency.

We then explore how SSTables achieve their sorted order despite being append-only files. Data is initially stored in memory within a data structure called a memtable, which uses self-balancing binary search trees like AVL Trees or Red Black Trees. When the memtable exceeds a certain size, it is flushed to disk as a segment file. Periodic merging and compaction processes remove duplicates and tombstone operations (marking deleted records).

For read operations, the memtable is checked first, followed by scanning the segments from most recent to oldest. To optimize this process and avoid unnecessary scans, Bloom Filters are employed. Bloom Filters are probabilistic data structures that use hashing to determine if a key might exist in a file. While they can produce false positives, they never produce false negatives.

In summary, SSTables provide an efficient and organized approach to data storage, offering benefits like sorted keys, reduced memory usage, and improved query performance. Combined with the use of memtables, merging, compaction, and Bloom Filters, SSTables enable databases to achieve high read and write performance while ensuring data integrity and space efficiency.

References

What is a blockchain in simple terms?

Amr Elhewy — Mon, 05 Jun 2023 21:32:59 GMT

Hello everyone! today I'm going to be talking about blockchain, what it is and give a simple code example of how to implement the simplest blockchain to ever exist.

This is a new topic to me so I thought I'd share my simple understanding. Let's start

Let's say that a company that makes regular financial transactions has a traditional accounting ledger that it uses to track every transaction it makes. The ledger can be something physical or digital it doesn't matter. Now every time the company makes a transaction it records it in the ledger & once it goes in it cannot be changed/altered in any way and if it were to be altered there will be a clear audit trail for that.

Now after giving this brief example. The transactions being added to the ledger are defined as immutable. Now immutable means that it cannot be modified in any way and if it were it would leave a trace behind it.

This ensures the integrity and accuracy of the financial records and gives a clear audit of all transactions.

Now you might ask what does all of that have to do with the blockchain? Well, a blockchain is nothing but a digital ledger to store immutable information. The data is stored in blocks and all the blocks are connected to form a chain of immutable connected data entries.

Every block has some data attached to it and the data is hashed (for example using SHA256) and the hash is stored alongside the data. Also, the hash of the previous block is stored as a link to it.

Let's go through a simple code example of what I'm trying to explain:

class Block:    def __init__(self, data, prev_hash) -> None:        self.data = data        self.prev_hash = prev_hash        self.timestamp = datetime.datetime.now()        self.hash = self.calc_hash()    def calc_hash(self):      sha = hashlib.sha256()      hash_str = self.data.encode('utf-8') + self.prev_hash.encode('utf-8') + str(self.timestamp).encode('utf-8')      sha.update(hash_str)      return sha.hexdigest()

This is a Block class that takes in the data and the previous hash from the block before it.

Then it starts calculating its hash by applying SHA256 on the data, the previous hash and the timestamp and returning a hex digest for us.

class BlockChain:    def __init__(self) -> None:        self.chain = [Block("Genesis Block", "0")]    def addBlock(self, data):        prev_hash = self.chain[-1].hash        new_block = Block(data, prev_hash)        self.chain.append(new_block)

This is the BlockChain class that simply has an array of blocks that is populated with the very first block in our chain and usually, it's called the genesis block.

We have an addBlock function that takes in data as a parameter and finds the previous blocks' hash, creates the new block and adds it to the chain.

If we run our example we'll get the following

block_chain = BlockChain()block_chain.addBlock("Second Block")block_chain.addBlock("Third Block")for chain in block_chain.chain:    print("data:" + chain.data)    print("hash:" + chain.hash)    print("prev_hash:" + chain.prev_hash)    print("timestamp:" + str(chain.timestamp))    print("\n")

# OUTPUTdata:Genesis Blockhash:f3b8d5c6587606a76a01e5c6f73d64cbe060d007a60836640fd96bc77fdfd39eprev_hash:0timestamp:2023-06-05 23:28:11.403768data:Second Blockhash:5c43264a82078ede4b477551a331174c31f5a0542636890beb43cd48a705d522prev_hash:f3b8d5c6587606a76a01e5c6f73d64cbe060d007a60836640fd96bc77fdfd39etimestamp:2023-06-05 23:28:11.403787data:Third Blockhash:7aca056ab9c71d85323248b49f3fc46e76a4cdcd6f6684d3a57826dc4e8e338eprev_hash:5c43264a82078ede4b477551a331174c31f5a0542636890beb43cd48a705d522timestamp:2023-06-05 23:28:11.403792

As we see each block has its hash and its previous blocks' hash too.

Altering blocks and preventing tampering

Now if I were to alter any block in these the whole chain will be invalid. Because now the hash of the altered block would change and it changing means that the block after it will have a prev_hash that points to something that doesn't exist anymore.

But what if we compute the new hash and all the hashes of all blocks that come after that? Overall the blockchain will become valid which it shouldn't. This is a security issue.

To mitigate this blockchains use something called Proof of Work.

Proof of work is a way of validating new blocks before they get added. You can control how you want the generated hash to start (for example set a rule that the generated hash MUST start with a certain word or number) and this usually would take lots of computing power depending on the difficulty you set. You can make guessing, or in proper terms 'mining' harder by increasing the complexity of the word you set. Mining is the process of trying to solve the hash for a reward.

One last thing before showing a code example is that the block hash generated will always remain the same because the contents don't change. To mitigate this a nonce is added and it's short for a number used only once. This number gets hashed along with the other data and incremented after every trial. Let's update our code accordingly.

class Block:    def __init__(self, data, prev_hash) -> None:        self.data = data        self.prev_hash = prev_hash        self.timestamp = datetime.datetime.now()        self.nonce = 0        self.difficulty = 3        self.hash = self.calc_hash()    def calc_hash(self):      sha = hashlib.sha256()      while True:        nonce_str = str(self.nonce)        hash_str = self.data.encode('utf-8') + self.prev_hash.encode('utf-8') + str(self.timestamp).encode('utf-8') + nonce_str.encode('utf-8')        sha.update(hash_str)        hash = sha.hexdigest()        if hash[:self.difficulty] == "0" * self.difficulty:          return hash        self.nonce += 1

We update our block class data here with the nonce and difficulty.

Then call calc_hash which has an infinite loop that keeps rehashing until we reach the specified difficulty. In this case, we assume that the first difficulty characters of the string have to equal 0. Increasing the difficulty makes mining much harder.

Now we run the code below again.

block_chain = BlockChain()block_chain.addBlock("Second Block")block_chain.addBlock("Third Block")for chain in block_chain.chain:    print("data:" + chain.data)    print("hash:" + chain.hash)    print("prev_hash:" + chain.prev_hash)    print("timestamp:" + str(chain.timestamp))    print("\n")

data:Genesis Blockhash:000e58859fcb5a579c5cf0a1ffa0e5e67877516342797a3b2dbc038464700ebdprev_hash:0timestamp:2023-06-06 00:10:36.948146data:Second Blockhash:00048e8c35c8a80125cce8acc4e7ca5abbeb37ab4da8bdfb30318081485cfd80prev_hash:000e58859fcb5a579c5cf0a1ffa0e5e67877516342797a3b2dbc038464700ebdtimestamp:2023-06-06 00:10:36.953128data:Third Blockhash:000f8b59fd504bce2f483115a258fe3c67147f3972b007f85db27194eec3ff24prev_hash:00048e8c35c8a80125cce8acc4e7ca5abbeb37ab4da8bdfb30318081485cfd80timestamp:2023-06-06 00:10:36.954252

As we can see every hash starts with three zeroes which was what we wanted. This controls the addition of blocks to our blockchain.

Lastly, Nowadays the block validating concept used is Proof of Stake which I'll explain in the upcoming future (when I understand it 😂).

Summary

Blockchain is the best option for immutable data that needs to be stored and cannot be changed in any way shape or form.

Blockchain usually is decentralized. This means Several nodes much have consensus together when adding a new block in verifying its integrity of it by rehashing the data and checking it across the defined criteria.

It's extremely difficult tampering with block data inside the blockchain. Ensuring data Integrity.

That's it for this one! Hope you learned something new today and like always, Till the next one!

References

Creating an AWS RDS instance with a read replica using Terraform

Amr Elhewy — Sat, 27 May 2023 19:05:11 GMT

Hello folks! In this article, I'm going to be going through a simple example of creating an RDS instance along with its read replica on AWS using Terraform.

If you read any of my previous terraform articles you know the drill. We're going to be creating a VPC and creating the RDS instances in a private subnet. Then spinning up an EC2 instance in a public subnet and we're going to ssh into that instance and access the RDS instances.

You may wonder why create the instances in a private subnet, not a public one.
Yes they will only be accessible from inside the VPC and we won't be able to access them from outside but I learned recently about a command sshuttle that tunnels from my network to the AWS VPC network and executes the requests from there. Hence why I used a simple EC2 instance that can access the RDS. (I was only trying to use it somehow forgive me 😅)
If you don't want to use it you can just add the RDS instances in the public subnets and you'll be able to access them normally. (I'll let you know when to do this down below)

Let's start by creating a new directory terraform-rds then start by creating our provider in our case AWS as shown below

# provider.tfprovider "aws" {  access_key = "access-key"  secret_key = "super-secret-key"  region = "eu-central-1"}

VPC and Subnets

We'll start by creating our VPC and subnets as shown below

# vpc.tfresource "aws_vpc" "rds-shuttle-vpc" {  cidr_block = "10.0.0.0/16"}resource "aws_internet_gateway" "prod-gateway" {  vpc_id = aws_vpc.rds-shuttle-vpc.id  tags = {    Name = "main-gateway"  }}

Simply we create our VPC giving it a CIDR block of 10.0.0.0/16 (First 16 bits are dedicated to the network, the rest to the host). And then we create an Internet Gateway so our VPC can have access to the public internet.

Now our subnets, we'll have 1 public and 2 private subnets:

resource "aws_subnet" "priv-subnet" {  vpc_id     =  aws_vpc.rds-shuttle-vpc.id  cidr_block = "10.0.10.0/24"  availability_zone = "eu-central-1a"  tags = {    Name = "rds-priv-subnet"  }  map_public_ip_on_launch = false}resource "aws_subnet" "priv-subnet2" {  vpc_id     =  aws_vpc.rds-shuttle-vpc.id  cidr_block = "10.0.12.0/24"  availability_zone = "eu-central-1b"  tags = {    Name = "rds-priv-subnet2"  }  map_public_ip_on_launch = false}resource "aws_db_subnet_group" "rds-sg" {  name       = "rds-subnet-group"  subnet_ids = [aws_subnet.priv-subnet.id, aws_subnet.priv-subnet2.id]  tags = {    Name = "My DB subnet group"  }}

For the private subnets, we choose a cidr block of 10.0.0.0/24 (First 3 bytes indicate a subnet, rest for the host), choosing a random IP address range for each subnet. We give them also the availability zone giving each one a different zone. Lastly, we set map_public_ip_on_launch to False and what this does is it prevents anything launched in this instance from having a public IP address.

Lastly, we create a subnet group (Our RDS instances will need it later on) adding in our 2 private subnets to it.

Now for the public subnet:

resource "aws_subnet" "public-subnet" {  vpc_id     =  aws_vpc.rds-shuttle-vpc.id  cidr_block = "10.0.1.0/24"  availability_zone = "eu-central-1b"  tags = {    Name = "rds-pub-subnet"  }}resource "aws_route_table" "pub-route-table" {  vpc_id = aws_vpc.rds-shuttle-vpc.id  route {    cidr_block = "0.0.0.0/0"    gateway_id = aws_internet_gateway.prod-gateway.id  }  route {    ipv6_cidr_block        = "::/0"    gateway_id = aws_internet_gateway.prod-gateway.id  }  tags = {    Name = "public subnet route table"  }}resource "aws_route_table_association" "subnet-1-association" {  subnet_id      = aws_subnet.public-subnet.id  route_table_id = aws_route_table.pub-route-table.id}

Same thing as before, creating a random IP address range for the subnet as well as an availability zone.

We create a routing table for this subnet to allow any requests to go through the internet gateway previously created since it should be a public subnet. The 0.0.0.0/0 is a shorthand for any requested IP address. The other block is for IPV6 . Then we proceed to bind the routing table with our public subnet.

Security Groups

We'll be creating security groups to open only required ports for our to be created rds and ec2 instances.

resource "aws_security_group" "rds-sg" {  name        = "rds-security-group"  description = "the rds sg"  vpc_id      = aws_vpc.rds-shuttle-vpc.id  ingress {    description      = "5432 psql"    from_port        = 5432    to_port          = 5432    protocol         = "tcp"    cidr_blocks      = ["0.0.0.0/0"]  }  egress {    from_port        = 0    to_port          = 0    protocol         = "-1"    cidr_blocks      = ["0.0.0.0/0"]    ipv6_cidr_blocks = ["::/0"]  }  tags = {    Name = "rds-sg"  }}

This security group is for our RDS instances and simply allows requests through port 5432 which is the default port for Postgresql (The database we'll be using). We also allow any IP address range to request from it. We can tailor this to only the VPC address range we chose since it's a private instance (optional).

For the egress (outgoing requests) we allow everything not putting a restriction here.

Now for the ec2 instance security group

resource "aws_security_group" "ec2-sg" {  name        = "ec2 sg"  description = "the ec2 sg"  vpc_id      = aws_vpc.rds-shuttle-vpc.id  ingress {    description      = "22 psql"    from_port        = 22    to_port          = 22    protocol         = "tcp"    cidr_blocks      = ["0.0.0.0/0"]  }  egress {    from_port        = 0    to_port          = 0    protocol         = "-1"    cidr_blocks      = ["0.0.0.0/0"]    ipv6_cidr_blocks = ["::/0"]  }  tags = {    Name = "rds-sg"  }}

We allow port 22 (our SSH port) only. Again no restrictions on our egress.

RDS instance + its read replica

For the rds instance we'll do the following

resource "aws_db_instance" "postgresql-rds" {  allocated_storage    = 5  db_name              = "funkymonkey"  engine               = "postgres"  engine_version       = "14.7"  instance_class       = "db.t3.micro"  username             = "db-username"  password             = "super-secret-password"  skip_final_snapshot  = true  backup_retention_period = 7  vpc_security_group_ids = [aws_security_group.rds-sg.id]  db_subnet_group_name = aws_db_subnet_group.rds-sg.name}

allocated_storage is pretty much what we want the storage of our database to be in GBs. I chose 5 GB because it's just for demo purposes. The database won't auto scale storage once it passes 5 GB. We can do that but i didn't add that part in the tutorial here.

We then add our db_name (obvious right), our engine is Postgres as mentioned before and the version we'll be using is 14.7. Feel free to tailor these based on your needs of course

Our instance class is db.t3.micro which provides a baseline performance level. For more info on instance classes visit here

Then we provide our database authentication username and password.

skip_final_snapshot Determines whether a final DB snapshot is created before the DB instance is deleted. We don't need anything since this is just for demo but it's better to enable this in real-life servers.

backup-retention-period is the number of days that we should retain (keep) backups. Since we have a read replica that we're going to provision, this must be added and must be greater than 0.

Then we proceed to provide the security group id we created previously along with the subnet group too.

Finally, we'll output the RDS endpoint

output "rds-url" {  value = aws_db_instance.postgresql-rds.endpoint}

To place the instance in a public subnet you'll need to have 2 public subnets in one group, then after creating a public subnet group simply add its id in the rds resource you're creating. That way when you get the RDS endpoint you can access it from your machine

Now for the read replica it's very simple

resource "aws_db_instance" "replica-postgresql-rds" {  instance_class       = "db.t3.micro"  skip_final_snapshot  = true  backup_retention_period = 7  replicate_source_db = aws_db_instance.postgresql-rds.identifier}output "replica-url" {  value=aws_db_instance.replica-postgresql-rds.endpoint}

We simply provide replicate_source_db and give it the identifier of the master db.

So far we created the RDS and it's read replica. But in private subnets, let's spin up and EC2 instance and try to access them!

EC2 instance to access the database

Let's spin up a basic EC2 instance giving it a public IP address so we can ssh to it. Before doing so we need a couple of resources ready

data "aws_ami" "ubuntu" {  most_recent = true  filter {    name   = "name"    values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]  }  filter {    name   = "virtualization-type"    values = ["hvm"]  }  owners = ["099720109477"]}

This is the data resource that fetches the AMI (Image) For our EC2 instance. We'll be using ubuntu base image

resource "aws_network_interface" "shuttle" {  subnet_id   = aws_subnet.public-subnet.id  private_ips = ["10.0.1.164"]  security_groups = [ aws_security_group.ec2-sg.id ]  tags = {    Name = "ec2-ni"  }}

A network interface that we'll use to attach to our EC2 instance. Giving it a private ip address of 10.0.1.164 (notice that it's located inside our public subnet of range 10.0.1.0/24 ) and our previously created security group.

resource "aws_eip" "shuttle-ip" {  domain                    = "vpc"  network_interface         = aws_network_interface.shuttle.id  associate_with_private_ip = "10.0.1.164"  depends_on                = [aws_internet_gateway.prod-gateway]}

A public static IP Address for our EC2 instance. Giving it the network interface and IP address.

resource "aws_key_pair" "deployer" {  key_name   = "shuttle-key"  public_key = "pub-key"}

A aws_key_pair which is super important to be able to ssh into the ec2 instance, you need to generate a public-private key pair using ssh keygen on your machine and paste the public key into the resource above.

resource "aws_instance" "shuttle-server" {  ami           = data.aws_ami.ubuntu.id  instance_type = "t3.micro"  network_interface {    network_interface_id = aws_network_interface.shuttle.id    device_index         = 0  }  tags = {    Name = "shuttle server"  }  key_name = aws_key_pair.deployer.key_name}

Finally the ec2 instance of type t3.micro, giving it the key name and network interface we created above.

Now that everything is set, let's execute terraform init followed by terraform apply and give it time to get everything ready

Once everything is up and running let's use sshuttle and dbeaver to connect to our database. Follow the steps below

Install sshuttle and dbeaver our database tool we'll be using
Once everything is installed, open a terminal and execute sshuttle -r ubuntu@ 0.0.0.0/0 --ssh-cmd "ssh -i "
The username has to be ubuntu since its the default for a ubuntu image, You can find the EC2 public address in your management console when you head to EC2. Or you can use output in terraform to output the address. The 0.0.0.0/0 is the IP address range you wish to direct to the EC2 instance. It means any request in that IP address range will be tunneled to the shuttle server and executed from there. (Exactly how a VPN works). We'll allow all requests to be tunneled through the ec2 instance.
Open dbeaver and create a new connection adding the RDS hostname (given from output), username and password and viola! you have access to both rds instances created. (The read replica has the same username and password as the master db)

That's all you need to know to simply provision an RDS instance along with its read replica! Thanks for reading and hope you learned something out of it. Till the next one!

Why code Idempotence is essential in workers. A case study

Amr Elhewy — Fri, 19 May 2023 18:11:58 GMT

Hello everyone! Today's article is a short but important case study about how not having idempotence in your code could cause you endless headaches.

What is code idempotence?

Simply put, idempotence is a property where multiple executions of a piece of code should always yield the same result, no matter how many times it got executed.

Not having idempotence in your code could lead to headaches and debugging problems that we can simply prevent in the first place if we implement it right

[Case Study] A worker bulk writing to the database

Imagine this, A worker takes in a CSV URL, downloads the CSV and proceeds to parse it. After parsing it will iterate over each row of the CSV and proceed to insert that row into the database.

The example code written below is implemented using ruby and Sidekiq which is a background worker.

class InsertBulkCSVJob < ActiveJob::Base    queue_as :bulk_csv_queue    sidekiq_options retry: 5    def perform(csv_url)        csv = parse_csv(csv_url)        csv.each do |r|            # logic to insert the row into the database        end    end    def parse_csv(csv_url)        # downloads the csv, parses it and returns it    endend

Unexpected failures might occur to this worker and prevent it from continuing, for example;

One record might fail due to a possible constraint missing
The worker process might get terminated (the host machine dies, the process crashes, etc)

If you have a retry policy in your worker then when everything is healthy it will attempt to retry the job. On retrying it will download the CSV once again and proceed to insert the rows into the database.

But what if we made some progress in our CSV? let's say we inserted 20% of our CSV already, retrying will cause data duplication where the first 20% will get inserted twice. In this case, writing code with this in consideration makes a huge difference and saves you much time down the road.

How can we improve the code written above?

The simplest way to approach an issue like this is by tracking our progress. We can have a key with the job id stored in Redis for example and increment this key every time an insertion succeeds. Let's take a look at the code down below

class InsertBulkCSVJob < ActiveJob::Base    queue_as :bulk_csv_queue    sidekiq_options retry: 5    def perform(csv_url)        csv = parse_csv(csv_url)        insert_bulk_events(csv)    end    def insert_bulk_events(csv)        Redis.SETNX(job_id, 0)        for i in (Redis.get(self.job_id).to_i)...csv.length)          OurModel.transaction do            OurModel.transaction.create(r[i])            Redis.INCR(self.job_id)          end        end        Redis.DEL(self.job_id)   end    def parse_csv(csv_url)        # downloads the csv, parses it and returns it    endend

If we look at the insert_bulk_events method, We start by writing to the key-value store a key (the job_id ) and an initial value of 0, (SETNX) means set if not exists so don't overwrite the value if its present.

When the job fails Sidekiq keeps the same job_id and doesn't issue a new one, which is very handy in our case.
We start iterating over the CSV from the starting index which is the value of the key we just set.
Wrapping the insertion along with the increment in one transaction to ensure atomicity and prevent unnecessary increments if for some reason the creation fails.
Finally, we delete the key once we finish and the job reaches its end.

This way if a job had to retry for any reason it will start from the first index that still needs to be inserted. However its still prone to some errors for example if the key got deleted from Redis for any reason. Also another downside is the amount of write queries being made but this is irrelevant to the problem we're discussing.

We can see how writing code with idempotence in mind can make a huge difference in how we approach our solution. Not only idempotence but other non related issues like concurrency, any race conditions, etc. Having even 1% doubt that something like this might happen would help improve your code resilience alot. However depending on the use case it might be a double edged weapon because you don't want to over engineer and overcomplicate things.

This was a quick article discussing my thoughts on a case that i found was interesting. Stay safe and till the next one!

Learning Rust; Creating a multithreaded worker from scratch.

Amr Elhewy — Sat, 29 Apr 2023 16:47:09 GMT

Hello everyone! In the past few weeks, I've been learning rust and wanted to apply what I learned through a small project and document it.

What we are going to do is simply, from scratch, we're going to be building a multithreaded worker that processes jobs that get pushed onto a Redis queue.

What we aim to achieve is the following:

Simply put. A rust consumer that listens to a Redis queue using BLPOP which blocks if there is nothing in the queue. Once we pop a job we're going to distribute it to one of our workers.

We're going to create a thread pool for our workers so we can manage our resources and not have to spawn a thread for each incoming job.

How will we pass the jobs to the workers (threads)?

We'll use channels for this. Channels are a unique way to synchronize between threads and share memory by communicating between the threads.

The thread pool will act as our Sender (who passes data into the channel) and the workers will act as Receivers

The workers will share the same receiver once a job gets passed in on of the workers will start processing it while the other still wait on that receiver waiting for more jobs.

If you wish to tag along create a new rust project using cargo new . If you don't have cargo/rust installed you can install it from here

Let's get started with our first component. The Thread Pool struct.

In our src directory we'll create a new file called threadpool.rs

// threadpool.rspub struct ThreadPool{  workers: Vec,  sender: Sender}

The struct consists of a vector (dynamic array) of workers and the sender part of the channel. In the channel we'll pass along an enum called Message which looks like this:

pub enum Message{  NewJob(Job),  Terminate}

The enum tells us that Message can have one of 2 values. Either a NewJob passing in the job as a parameter or Terminate signal which if passed down to a worker will tell it to shut down gracefully.

Rust relies on enums heavily when a value can have a certain possible set of values. For more info check out the rust documentation here

The struct we created has 2 methods defined in it new and execute, The new method looks like this

impl ThreadPool {  pub fn new(size: usize) -> ThreadPool {    assert!(size > 0);    // Create a channel for a single producer & multiple consumers    let (sender, receiver) = channel::();    // Create ARC for receiver so it can be used in multiple threads.    let receiver_rc = Arc::new(Mutex::new(receiver));    // Create  workers.    let mut workers: Vec = vec![];    for i in 1..size + 1{      workers.push(Worker::new(i,Arc::clone(&receiver_rc)));    }    ThreadPool { workers, sender }  }

What we do is:

We create the channel
Create an ARC (Atomically reference counter) pointer which in Rust allows shared ownership of some value. Since we want several workers to use the receiver part of the channel we need a thread safe method to do it and reference counters are the way to do it. More info here
We wrap the receiver with a Mutex before proceeding since we're going to mutate it in the workers by taking data from it. A mutex will safely allow this were each worker needs to acquire a lock in order to be able to listen to the channel.
We proceed in creating the Worker struct (which we'll look at next). Creating a desired size of workers as specified in the parameter size
And for each worker we pass in a given ID along with its reference to the receiver. The clone function increments the reference count for each worker.
Then we return the new ThreadPool struct created.

Now the execute method

  pub fn execute(&self, job: F)  where    F: FnOnce() + 'static + Send  {    let job = Box::new(job);    match self.sender.send(Message::NewJob(job)){      Ok(_) => "Job Queued to Workers!",      Err(e) => panic!("An Error occured while queuing a job to the workers {}", e)    };  }

The execute method takes a generic that implements certain traits. Traits are similar to interfaces in other programming languages but offer alot more.

We're going to pass the execute method a closure as a parameter. Closures are similar to Ruby's proc it's just a block of code to execute without it explicitily being defined as a function.

F: FnOnce() + 'static + Send This line means the closure needs to implement 3 traits;

FnOnce() is a trait for functions that can be called once, taking no parameters, and returning some value. It is a closure trait that indicates that the function can be called only once and takes ownership of any captured variables.
static is a lifetime specifier that indicates that the references inside the function must live for the entire lifetime of the program. This is useful for functions that need to return references to data that will exist for the entire duration of the program.
Send is a trait that indicates that the function can be safely sent across threads to be executed. This means that any data used by the function must be thread-safe and not have any data races.

Then when a closure is passed we allocate memory for it in the heap using Box::new returning a pointer to the closure so threads can safely use it.

Lastly we proceed to send the job via our created channel and check for any existing errors that might've occured. If errors exist we'll panic with a message.

Before moving on to the worker struct. In order to Terminate our workers when we finish and gracefully shutdown them. We'll need to implement a trait to our ThreadPool.

The drop trait has a method that allows us to add some logic when the ThreadPool gets destructed. Calling the destructor of ThreadPool with extra logic that terminates our workers.

impl Drop for ThreadPool {  fn drop(&mut self) {      for (i,_) in self.workers.iter().enumerate() {        self.sender.send(Message::Terminate).unwrap();      }      for worker in &mut self.workers{        if let Some(thread) = worker.thread.take() {            println!("Shutting down worker {}", worker.id);            thread.join().unwrap();        }    }  }}

We loop over our created workers and send the Terminate message. We also use thread.join() (similar to waitgroup in go) that waits for the threads to finish any ongoing job before terminating. This is to prevent the main thread from returning first thus killing all the ongoing threads.

Worker Struct

Our second part is implementing the worker struct which looks like the following

// worker.rspub struct Worker{  pub id: usize,  pub thread: Option>,}

It has 2 properties. ID & thread. The id is the one we sent earlier in our thread pool and we'll discover the thread below. The thread returns an Option enum which states that it can be either a value or none (null). The JoinHandle type has a join method that can be used to wait for the thread to finish and retrieve the result.

The worker struct has a new method looking like this

impl Worker{  pub fn new(id: usize, receiver: Arc>>) -> Worker{    let thread = spawn(move || {      poll_tasks(receiver);    });    Worker { id, thread: Some(thread) }  }}

Simply put, takes in the id and the receiver from the channel we created earlier. It spawns a new thread and invokes a function called poll_tasks passing in the receiver. Then simply returns the worker struct.

The poll_tasks function:

fn poll_tasks(rec: Arc>>){  loop {    // listen on the channel for Messages.      match rec.lock().unwrap().recv().unwrap(){        Message::NewJob(job) => job(),        Message::Terminate => break      }  }}

Simply tries to lock on the receiver & listens for any jobs passed from the sender aka ThreadPool. Once a message gets passed it matches against it whether it's a new job or a termination and executes all this in an infinite loop.

Only if it's a termination it breaks from the loop and thus the thread will finish execution.

Redis BLPOP

Finally, all that's left is adding Redis, connecting to it & using the command BLPOP to a given queue.

I will add a new struct called RedisClient that takes in the created connection and abstracts the redis logic.

pub struct RedisClient {    pub conn: Connection,}impl RedisClient {    pub fn brpop(&mut self) -> Option<String>{      let value:Option<(String, String)> = self.conn.brpop("jobs", 3).unwrap();      return match value {        Some(value) => Some(value.1.into()),        None => None      }    }}

We have a struct with a conn property and a brpop method that listens to a queue named jobs. It will block for 3 seconds maximum if it didn't find anything in the queue then it matches against the result whether it is a timeout or a job.

In our main.rs we can have the following

// main.rs    let redis_init = redis::Client::open("127.0.0.1:6379").unwrap();    let conn = redis_init.get_connection().unwrap();    let mut redis_client = RedisClient { conn: conn };

We create the connection having spun up a Redis docker container. unwrap function panics if any error occurs.

After creating our redis_client all that's left is connecting everything together

   //main.rs   let pool = ThreadPool::new(POOL_SIZE);   loop {      match redis_client.brpop(){        Some(val) => {          pool.execute(move || {            println!("Found a job in the queue, processing {}", val)          })        },        None => println!("Waiting for items..")      }    }

We create the thread pool and use the brpop in an infinite loop. Whenever we get a val from the queue we'll execute the passed closure that just prints that it found a job.

To add items to the queue do the following:

docker exec -it bash
Type in redis-cli
RPUSH jobs "first_job"

And you should see the terminal print that it found a job!

Summary

Rust is a beautiful language. But it has a somewhat difficult learning curve because of the concepts of ownership, etc. This is just a demo I was practicing with so I thought I'd share it. Hope you got anything out of it and I'll see you in the next one!

References

Adding CI/CD with AWS CodePipeline to an EKS cluster using Terraform

Amr Elhewy — Sat, 15 Apr 2023 14:23:34 GMT

wed dddddToday's article will be all about CICD(Continous Integration & Continous Delivery). We'll be using AWS CodePipeline to automate deployments to our Kubernetes cluster whenever we push code to our GitHub repo. What we want to achieve is explained in the following image.

The only difference is we will not use AWS CodeCommit as our source code repository. Instead, we'll use GitHub.

Prerequisites

Having an already running EKS cluster on AWS, I made an article explaining how to configure one here
Having a GitHub repository containing the source code for let's say one of the existing services inside our EKS Cluster. If you're familiar with my previous articles we'll use our temperature-api service as an example.

What we'll be doing

Briefly, CodePipeline is a series of steps (aka pipeline 😅) that consists of (for our case) mainly 2 steps. Source and Build.

Source when connected with our Github Repository will listen for changes on a specified branch, clone the source code as an output artifact and pass it to the Build step

The Build step then uses a file called buildspec.yaml that should exist in our source code with given instructions to build a new image, tag it and push it to ECR along with updating our Kubernetes Deployment Image. We'll get into everything later on.

So for now our steps will be as follows:

Provisioning CodeBuild resource
Provisioning CodePipeline resource
Creating the buildspec.yaml file
Connecting everything together and applying our terraform code.

CodeBuild

In any directory you wish in your existing terraform code, let's create a file called codebuild.tf

Firstly let's create an ECR Repository for our images

# ECR.tf# This is where our images will be stored.resource "aws_ecr_repository" "prod-temp-api-repository" {  name                 = "prod-temp-api"  image_tag_mutability = "MUTABLE"  image_scanning_configuration {    scan_on_push = true  }}

Before creating the code build resource, we'll have to give it a role that it can assume when it needs to access AWS resources.

#codebuild.tf# the following is our trust relationship for the build role stating that only codebuild can assume the role.data "aws_iam_policy_document" "build_assume_role" {  statement {    effect = "Allow"    principals {      type        = "Service"      identifiers = ["codebuild.amazonaws.com"]    }    actions = ["sts:AssumeRole"]  }}# We create the role and bind the trust relationship with itresource "aws_iam_role" "build-role" {  name               = "codebuild-role"  assume_role_policy = data.aws_iam_policy_document.build_assume_role.json}# Now we add a few policies to the role (what will the role owner be able to do?)# First access to ECR for pulling and pushing images to itresource "aws_iam_policy" "build-ecr" {  name = "ECRPOLICY"  policy = jsonencode({    "Statement" : [      {        "Action" : [          "ecr:BatchGetImage",          "ecr:BatchCheckLayerAvailability",          "ecr:CompleteLayerUpload",          "ecr:GetAuthorizationToken",          "ecr:InitiateLayerUpload",          "ecr:PutImage",          "ecr:UploadLayerPart",        ],        "Resource" : "*",        "Effect" : "Allow"      },    ],    "Version" : "2012-10-17"  })}# Another policy that enables us to update our kubeconfig when we're in the build stageresource "aws_iam_policy" "eks-access" {  name = "EKS-access"  policy = jsonencode({    "Version": "2012-10-17",    "Statement": [        {            "Effect": "Allow",            "Action": [                "eks:DescribeCluster"            ],            "Resource": "*"        }    ]} )}# Binding the 2 previous policiesresource "aws_iam_role_policy_attachment" "eks" {  role = aws_iam_role.build-role.name  policy_arn = aws_iam_policy.eks-access.arn}resource "aws_iam_role_policy_attachment" "attachmentsss" {  role = aws_iam_role.build-role.name  policy_arn = aws_iam_policy.build-ecr.arn}# One last policy to give access to S3 for artifacts (codepipeline will throw artifacts into s3 and codebuild needs access to pull it from there & also push the build output into s3)# This is another way to write the policy, not with jsonencode as above. A local data source using terraform as below.# This allows codebuild to write logs, get any ec2 network information it needs and access s3. All this is recommended per AWS documentation.data "aws_iam_policy_document" "build-policy" {  statement {    effect = "Allow"    actions = [      "logs:CreateLogGroup",      "logs:CreateLogStream",      "logs:PutLogEvents",    ]    resources = ["*"]  }  statement {    effect = "Allow"    actions = [      "ec2:CreateNetworkInterface",      "ec2:DescribeDhcpOptions",      "ec2:DescribeNetworkInterfaces",      "ec2:DeleteNetworkInterface",      "ec2:DescribeSubnets",      "ec2:DescribeSecurityGroups",      "ec2:DescribeVpcs",    ]    resources = ["*"]  }  statement {    effect    = "Allow"    actions   = ["ec2:CreateNetworkInterfacePermission"]    resources = ["*"]    condition {      test     = "StringEquals"      variable = "ec2:Subnet"      values = [        aws_subnet.private-central-1b.arn,        aws_subnet.private-central-1a.arn,      ]    }    condition {      test     = "StringEquals"      variable = "ec2:AuthorizedService"      values   = ["codebuild.amazonaws.com"]    }  }  statement {    effect  = "Allow"    actions = [      "s3:GetObject",      "s3:GetObjectVersion",      "s3:GetBucketVersioning",      "s3:PutObjectAcl",      "s3:PutObject",    ]    resources = [      aws_s3_bucket.codepipeline_bucket.arn,      "${aws_s3_bucket.codepipeline_bucket.arn}/*"    ]  }}# Attaching the previous policyresource "aws_iam_role_policy" "s3_access" {  role   = aws_iam_role.build-role.name  policy = data.aws_iam_policy_document.build-policy.json}

That was it for our role part, I've added comments above every one explaining why we attached it. Now for codebuild itself. But before this let's create an S3 bucket that holds all our pipeline artifacts.

# buckets.tfresource "aws_s3_bucket" "codepipeline_bucket" {  bucket = "pipeline-bucket-34aHAhdasD"}resource "aws_s3_bucket_acl" "pipebucket_acl" {  bucket = aws_s3_bucket.codepipeline_bucket.id  acl    = "private"}# Please Note that s3 buckets are globally namespaced so you might need to pick a very specific name as most of them might be already taken

# CodeBuild.tfresource "aws_codebuild_project" "temp-api-codebuild" {  name          = "temp-api"  build_timeout = "5" # Timeout 5 minutes for this build  service_role  = aws_iam_role.build-role.arn # Our role we specified above# Specifying where our artifacts should reside  artifacts {    type           = "S3"    location       = aws_s3_bucket.codepipeline_bucket.bucket    name           = "temp-api-build-artifacts"    namespace_type = "BUILD_ID"  }# Enviroments specifying the codebuild image and some enviromental variables, privileged mode enables us to access higher privilages when in build mode. It's very important to for example start the docker service and it won't work unless specified true.  environment {    privileged_mode = true    compute_type                = "BUILD_GENERAL1_SMALL"    image                       = "aws/codebuild/standard:1.0"    type                        = "LINUX_CONTAINER"    image_pull_credentials_type = "CODEBUILD"    environment_variable {      name = "IMAGE_TAG"      value = "latest"    }    environment_variable {      name = "IMAGE_REPO_NAME"      value = "prod-temp-api" # My github repository name    }    environment_variable {      name = "AWS_DEFAULT_REGION"      value = "eu-central-1" # My AZ    }    environment_variable {      name = "AWS_ACCOUNT_ID"      value = "" # AWS account id    }  }# Here i specify where to find the source code for building. in our case buildspec.yaml which resides in our repo. You can omit using a buildspec file and just specify the steps here. Refer to terraform documentation for this.  source {    type            = "GITHUB"    location        = "https://github.com/amrelhewy09/temp-api.git"    git_clone_depth = 1    buildspec       = "buildspec.yaml"  }}

This is everything related to codebuild done! Before moving on to CodePipeline i want to show you the buildspec.yaml file so we have a complete understanding of the build process before moving on

version: 0.2phases:  install:    commands:     - nohup /usr/local/bin/dockerd --host=unix:///var/run/docker.sock --host=tcp://127.0.0.1:2375 --storage-driver=overlay2 &      - timeout 15 sh -c "until docker info; do echo .; sleep 1; done"      - echo Logging in to Amazon ECR...      - aws --version      - aws ecr get-login-password --region $AWS_DEFAULT_REGION | docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com      - echo Installing kubectl      - curl -o kubectl https://amazon-eks.s3.$AWS_DEFAULT_REGION.amazonaws.com/1.15.10/2020-02-22/bin/darwin/amd64/kubectl      - chmod +x ./kubectl      - kubectl version --short --client  pre_build:    commands:      - aws eks --region $AWS_DEFAULT_REGION update-kubeconfig --name eks-cluster-production      - cat ~/.kube/config      - REPOSITORY_URI=$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME      - TAG="$(date +%Y-%m-%d.%H.%M.%S).$(echo $CODEBUILD_RESOLVED_SOURCE_VERSION | head -c 8)"      - echo $TAG  build:    commands:      - echo Build started on `date`      - echo Building the Docker image...      - docker pull $REPOSITORY_URI:$IMAGE_TAG || true      - docker build --cache-from $REPOSITORY_URI:$IMAGE_TAG --tag $REPOSITORY_URI:$TAG .      - docker tag $REPOSITORY_URI:$TAG $REPOSITORY_URI:$IMAGE_TAG  post_build:    commands:      - echo Build completed on `date`      - echo Pushing the Docker images...      - REPO_URI=$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAILT_REGION.amazonaws.com/$IMAGE_REPO_NAME      - docker push $REPOSITORY_URI:$IMAGE_TAG      - docker push $REPOSITORY_URI:$TAG      - echo Applying changes to deployment      - kubectl -n temp-calculator set image deployment/temperature-api temperature-api=$REPOSITORY_URI:$TAG      - echo Writing image definitions file...      - printf '[{"name":"%s","imageUri":"%s"}]' "$CONTAINER_NAME" "$REPO_URI:$IMAGE_TAG" | tee imagedefinitions.jsonartifacts:  files: imagedefinitions.json

This is probably the simplest buildspec you'll ever see 😅. Buildspec consists of several steps;

Install phase where we install any dependencies to our build. In our case we do 3 main things; Start the docker daemon, Log in to AWS Elastic Container Registry and install kubectl.
Prebuild phase is anything we need to do before building. In our case, we update our kubeconfig to point to our EKS cluster and add a couple of variables to be used later on; TAG refers to today's date and the commit hash we're building, REPOSITORY_URI is our ECR repository name
The build phase consists of pulling the latest image from ECR for caching reasons and building a new image with the newest source code, the reason we pulled the image first was to reuse the layers that didn't change between builds. We tag the new image with :latest and :commit_hash tags accordingly.
Postbuild pushes the images to ECR and sets the new image to our existing Kubernetes deployment. Then writes an image definitions json file that is outputted to s3.

Before moving to CodePipeline there's 2 things we must do.

Change our Kubernetes deployment image to be the following;

image: .dkr.ecr..amazonaws.com/:`

Edit our Kubernetes aws-auth ConfigMap to allow the Codebuild role created to have privileges in our EKS cluster, refer to here for a full explanation.

CodePipeline

Now comes the CodePipeline phase. We'll start with roles as usual 😂

# Codepipeline.tf# Our trust relationshipdata "aws_iam_policy_document" "pipeline_assume_role" {  statement {    effect = "Allow"    principals {      type        = "Service"      identifiers = ["codepipeline.amazonaws.com"]    }    actions = ["sts:AssumeRole"]  }}# Our pipeline roleresource "aws_iam_role" "codepipeline_role" {  name               = "pipeline-role"  assume_role_policy = data.aws_iam_policy_document.pipeline_assume_role.json}# Our policies, allows S3 access for artifacts and codebuild access to start builds.data "aws_iam_policy_document" "codepipeline_policy" {  statement {    effect = "Allow"    actions = [      "s3:GetObject",      "s3:GetObjectVersion",      "s3:GetBucketVersioning",      "s3:PutObjectAcl",      "s3:PutObject",    ]    resources = [      aws_s3_bucket.codepipeline_bucket.arn,      "${aws_s3_bucket.codepipeline_bucket.arn}/*"    ]  }  statement {    effect = "Allow"    actions = [      "codebuild:BatchGetBuilds",      "codebuild:StartBuild",    ]    resources = ["*"]  }}# Binding the policy document to our role.resource "aws_iam_role_policy" "codepipeline_policy" {  name   = "codepipeline_policy"  role   = aws_iam_role.codepipeline_role.id  policy = data.aws_iam_policy_document.codepipeline_policy.json}

Now for CodePipeline.

resource "aws_codepipeline" "codepipeline" {  name     = "temp-api-pipeline"  role_arn = aws_iam_role.codepipeline_role.arn # our created role above# Specifying the artifact store  artifact_store {    location = aws_s3_bucket.codepipeline_bucket.bucket    type     = "S3"  }  stage {    name = "Source"# Telling codepipeline to pull from third party (github)     action {      name             = "Source"      category         = "Source"      owner            = "ThirdParty"      provider         = "GitHub"      version          = "1"      output_artifacts = ["source_output"]# the output of the source(which is the source code) gets added in a directory called source_output in our s3 bucket      configuration = {        Owner      = ""        Repo       = "temp-api"        Branch     = "main"# Dont forget to create a github token and give it repo privileges        OAuthToken = ""      }    }  }  stage {    name = "Build"# Build stage takes in input from source_output dir (source code) & we provide it only with the codebuild id we created from the first step.    action {      name             = "Build"      category         = "Build"      owner            = "AWS"      provider         = "CodeBuild"      input_artifacts  = ["source_output"]      output_artifacts = ["build_output"]      version          = "1"      configuration = {        ProjectName = aws_codebuild_project.temp-api-codebuild.name      }    }  }}

That's it we're all set! Just applying terraform apply will provision everything and as soon as you push code to your GitHub repository a build will trigger.

Once the build triggers it will automatically build a new image, push it to ECR and update your Kubernetes Deployment. Happy Coding!

Always remember to terraform destroy after finishing to avoid any extra billings😅

Resources

Deploying an EKS cluster using Terraform

Amr Elhewy — Sat, 08 Apr 2023 16:17:26 GMT

Hello everyone! today's article will be a walkthrough of deploying an EKS cluster of my infamous temperature calculator app (seen in previous blogs) that consists of 2 services; An API endpoint and a service that calculates the temperature and returns the response.

We'll cover everything related to provisioning an EKS Cluster from scratch and I'll try to simplify everything as much as I can. Let's get started

You'll need some basic knowledge about networking, AWS & Kubernetes, so make sure you have a good foundation before proceeding.

We'll divide the process into several steps so we can have the bigger picture at the end:

Configuring the network (VPC, Subnets, NAT, etc)
Creating the EKS Cluster (Assuming roles, creating node groups, etc)
Configuring IAM OIDC with Kubernetes Service Accounts
Installing Application Load Balancer
Adding our Kubernetes resources to the cluster
Creating a Domain name & testing everything!

Configuring the Network

I've done a walkthrough before on creating a VPC (Virtual Private Cloud) in AWS, but we'll go through it again.

First, let's create an empty directory for our terraform config files and add AWS as a provider.

# aws-provider.tfprovider "aws" {  access_key = "my-access-key"  secret_key = "super-secret-key"  region = "eu-central-1"}

You can get access & secret keys from your AWS console, more info here

Now we can start creating our network resources, let's start out with our VPC

# network.tfresource "aws_vpc" "eks-cluster-vpc" {  cidr_block = "10.0.0.0/16"}

Our VPC has a CIDR block which gives us approximately 65k IP addresses per network

We'll need to also add an AWS Internet Gateway to enable this VPC to have any access to the public internet

# network.tfresource "aws_internet_gateway" "gw" {  vpc_id = aws_vpc.eks-cluster-vpc.id  tags = {    Name = "main-ig-gateway"  }}

Now we'll start creating our subnets. AWS EKS Requires at least 2 public subnets in different availability zones otherwise it won't work. So we'll create a total of 4 subnets 2 public and 2 private and we'll distribute them across different AZs

# network.tfresource "aws_subnet" "private-central-1a" {  vpc_id     = aws_vpc.eks-cluster-vpc.id  cidr_block = "10.0.1.0/24"  availability_zone = "eu-central-1a"  tags = {    Name = "eu-central-1a-private"    "kubernetes.io/role/internal-elb" = 1    "kubernetes.io/cluster/eks-cluster-production" = "shared"  }}resource "aws_subnet" "public-central-1b" {  vpc_id     = aws_vpc.eks-cluster-vpc.id  cidr_block = "10.0.2.0/24"  availability_zone = "eu-central-1b"  tags = {    Name = "eu-central-1b-public"    "kubernetes.io/role/elb" = 1    "kubernetes.io/cluster/eks-cluster-production" = "shared"  }}resource "aws_subnet" "public-central-1a" {  vpc_id     = aws_vpc.eks-cluster-vpc.id  cidr_block = "10.0.3.0/24"  availability_zone = "eu-central-1a"  tags = {    Name = "eu-central-1a-public"    "kubernetes.io/role/elb" = 1    "kubernetes.io/cluster/eks-cluster-production" = "shared"  }}resource "aws_subnet" "private-central-1b" {  vpc_id     = aws_vpc.eks-cluster-vpc.id  cidr_block = "10.0.4.0/24"  availability_zone = "eu-central-1b"  tags = {    Name = "eu-central-1b-private"    "kubernetes.io/role/internal-elb" = 1    "kubernetes.io/cluster/eks-cluster-production" = "shared"  }}

This might not have any terraform magic in it (simple plain code) and there are ways to make this much smaller but for the sake of the article I'll keep it raw.

Each subnet has its own CIDR block and availability zone. We created a private and public subnet on each AZ.

Also, we added these tags for the private subnets as per AWS EKS Documentation

"kubernetes.io/role/internal-elb" = 1"kubernetes.io/cluster/eks-cluster-production" = "shared"

These tags help out with subnet auto-discovery when installing the Application Load Balancer at the end it auto-discovers all the cluster subnets using these tags.

Setting "kubernetes.io/role/internal-elb" = 1 states that the subnet has a private load balancer and is not exposed to the internet.

Setting the "kubernetes.io/cluster/eks-cluster-production" = "shared" means that this subnet can be used by other AWS resources other than EKS (RDS for example), if it was only used by EKS we can set it to owned

As for public subnets, we add "kubernetes.io/role/elb" = 1 Which states that load balancing in them can and will be exposed to the public internet.

For private subnets, as of right now they are unable to access the public internet which can lead to a problem (can't install any updates, pull any images from docker hub for example, etc) we will provision a NAT Gateway that our private subnets can use to access the internet.

# NAT.tfresource "aws_eip" "nat-ip" {  vpc      = true}resource "aws_nat_gateway" "my-nat-gateway" {  allocation_id = aws_eip.nat-ip.id  subnet_id     = aws_subnet.public-central-1b.id  tags = {    Name = "gw NAT"  }  depends_on = [aws_internet_gateway.gw]}

The NAT Gateway requires the VPC to have an Internet Gateway and takes in a subnet where the gateway should reside and an Elastic IP which gives us a public static IP Address for the Gateway.

So Far we created 4 subnets 2 of them being private and 2 public, but so far it's only just naming and we haven't done anything to make sure that one is private and the other is public, We'll now start by creating route tables that decide where and how to route the packets from the subnets.

# route-tables.tfresource "aws_route_table" "private-routing-table" {  vpc_id = aws_vpc.eks-cluster-vpc.id  route {    cidr_block = "0.0.0.0/0"    nat_gateway_id = aws_nat_gateway.my-nat-gateway.id  }}resource "aws_route_table" "public-routing-table" {  vpc_id = aws_vpc.eks-cluster-vpc.id  route {    cidr_block = "0.0.0.0/0"    gateway_id = aws_internet_gateway.gw.id  }}resource "aws_route_table_association" "private-association" {  subnet_id      = aws_subnet.private-central-1a.id  route_table_id = aws_route_table.private-routing-table.id}resource "aws_route_table_association" "public-association" {  subnet_id      = aws_subnet.public-central-1b.id  route_table_id = aws_route_table.public-routing-table.id}resource "aws_route_table_association" "private-association2" {  subnet_id      = aws_subnet.private-central-1b.id  route_table_id = aws_route_table.private-routing-table.id}resource "aws_route_table_association" "public-association2" {  subnet_id      = aws_subnet.public-central-1a.id  route_table_id = aws_route_table.public-routing-table.id}

What we did was provision 2 routing tables and give them each the same CIDR block 0.0.0.0/0 which means any IPV4 Addresses get routed to the Internet Gateway in public subnets and the NAT Gateway in private ones. Then we proceeded to bind them to subnets using aws_route_table_association

Recap: What we did so far was provision a VPC along with an Internet Gateway, a NAT Gateway and 4 subnets and then proceeded to bind the routing tables for public and private subnets respectively

Creating the EKS Cluster

Now comes the fun part 🤡. Before continuing we'll need to talk a bit about AWS IAM Roles and what assuming roles means because we're going to need this information moving forward.

One of the many items on the list of AWS security best practices is to use roles to grant limited access to certain resources for a limited period.

Now in IAM, there exists what we call Principals, these are entities who can make calls to AWS APIs and can be anything from a human being to an AWS Service.

Roles also can be considered principles, once a user assumes a certain role he can have access to all the policies that belong to that role.

A role has the primary purpose of granting temporary permissions to perform API calls in an account. To use a role, it has to be assumed.

A policy allows or denies a set of actions to a principal on certain resources.

There are 2 types of policies, Identity and resource-based policy

Identity-based policies can be attached to all identities and resource-based policies belong to resources. Theyre very similar in what they do, but there are a few key differences; an identity-based policy answers the question Which API calls can this identity perform on which resources? whereas a resource-based policy answers the question: Which identities can perform which actions on me?.

Each role has a trust relationship which determines the entities that can assume the role. It also has a set of permissions that define which privileges entities get after they assume the role.

When we want to assume a role, we need to make an API Call to AWS Security Token Service (STS) using the sts:AssumeRole Action. It will only provide us with valid temporary credentials when the following cases succeed:

A permission to actually call sts:AssumeRole exists in the identity policy
The principal who wants to assume the role is listed in the trust relationship of the role.

On success the principal is granted a security token which he can make the API Call for the requested resource.

Why did we have to know this you might ask? Well the EKS cluster in our case can be considered a principal, the cluster needs to assume a certain role so it can make API Calls to required AWS Resources for it to build & work properly.

Let's continue building EKS

# EKS.tfdata "aws_iam_policy_document" "assume_role" {  statement {    effect = "Allow"    principals {      type        = "Service"      identifiers = ["eks.amazonaws.com"]    }    actions = ["sts:AssumeRole"]  }}resource "aws_iam_role" "eks-role" {  name               = "eks-cluster-example"  assume_role_policy = data.aws_iam_policy_document.assume_role.json}resource "aws_iam_role_policy_attachment" "example-AmazonEKSClusterPolicy" {  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"  role       = aws_iam_role.eks-role.name}

What we did above was create the policy document which is basically our trust relationship as explained above, we allow it to make an API Call to sts:AssumeRole and only the eks service has the abillity to assume this role.

Then we proceed to create the actual role and bind the trust relationship to it.

After that we created a policy that attaches the role created to arn:aws:iam::aws:policy/AmazonEKSClusterPolicy.

So when a principal assumes this role he has access to anything governed inside arn:aws:iam::aws:policy/AmazonEKSClusterPolicy

# EKS.tfresource "aws_eks_cluster" "prod-eks-cluster" {  name     = "eks-cluster-production"  role_arn = aws_iam_role.eks-role.arn  vpc_config {    subnet_ids = [aws_subnet.private-central-1a.id, aws_subnet.public-central-1b.id, aws_subnet.private-central-1b.id, aws_subnet.public-central-1a.id]  }  depends_on = [    aws_iam_role_policy_attachment.example-AmazonEKSClusterPolicy  ]}

Finally the EKS cluster which takes in the arn (Amazon Resource Name) of the role created, an array of all the subnets we wanta this cluster to have and a depends on policy which makes sure that the policy gets created & attached to the role first before the EKS cluster.

Now we'll need to create the workers of cluster. Node groups are the underlying nodes that the cluster uses.

Node groups require us to assume roles to it aswell since it makes underlying calls to different AWS resources.

# Node-groups.tfresource "aws_iam_role" "ng-example" {  name = "eks-node-group-example"  assume_role_policy = jsonencode({    Statement = [{      Action = "sts:AssumeRole"      Effect = "Allow"      Principal = {        Service = "ec2.amazonaws.com"      }    }]    Version = "2012-10-17"  })}resource "aws_iam_role_policy_attachment" "example-AmazonEKSWorkerNodePolicy" {# Allows them to connect to EKS clusters  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"  role       = aws_iam_role.ng-example.name}resource "aws_iam_role_policy_attachment" "example-AmazonEKS_CNI_Policy" {# Service that Adds IP addresses to kubernetes nodes  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"  role       = aws_iam_role.ng-example.name}resource "aws_iam_role_policy_attachment" "example-AmazonEC2ContainerRegistryReadOnly" {# ECR for images if exists  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"  role       = aws_iam_role.ng-example.name}

Here we created a iam role and used the jsonencode function instead of creating an aws_iam_policy_document which was what we did in the eks role.

We allowed the call to sts:AssumeRole where the principal has to be the EC2 service.

Then we attached a couple of policies required by node groups to correctly start as per AWS Documentation.

# Node-groups.tfresource "aws_eks_node_group" "prod-eks-node-group" {  cluster_name    = aws_eks_cluster.prod-eks-cluster.name  node_group_name = "prod-group"  node_role_arn   = aws_iam_role.ng-example.arn  subnet_ids      = [aws_subnet.private-central-1a.id, aws_subnet.private-central-1b.id]  scaling_config {    desired_size = 3    max_size     = 4    min_size     = 1  }  update_config {    max_unavailable = 1  }  capacity_type = "ON_DEMAND"  instance_types = ["t3.micro"]  depends_on = [    aws_iam_role_policy_attachment.example-AmazonEKSWorkerNodePolicy,    aws_iam_role_policy_attachment.example-AmazonEKS_CNI_Policy,    aws_iam_role_policy_attachment.example-AmazonEC2ContainerRegistryReadOnly,  ]}

Here we configure the node group, give it the EKS cluster name, the role ARN and the subnets i want my nodes to be provisioned in.

Giving it a desired size of 3 nodes with a max_size of 4. Max size won't work unless we use auto scaler which won't be covered today but in the future.

Then the capacity type and instance types; i chose the cheapest option for demo purposes. Finally a depends on block that waits for the policies to get attached to the role before creating the node group.

Configuring IAM OIDC with Kubernetes Service Accounts

Before moving forward to our last stop which is creating the Kubernetes Services & AWS Load Balancer Controller. It's a good practice to implement IAM OpenID Connect and allow Kubernetes Service Accounts to integrate with it.

IRSA (IAM Roles for service accounts) is the AWS EKS way to allow applications running in EKS pods to access AWS API, using permissions configured in AWS IAM roles. Its an improvement over the previous architecture of applications running in pods to use the IAM roles of the underlying EKS nodes.The problem here is that all pods running on the Kubernetes node share the same set of permissions and this can cause a violation of the least privilege principle.

# oidc.tfdata "tls_certificate" "eks"{ url = aws_eks_cluster.prod-eks-cluster.identity[0].oidc[0].issuer}resource "aws_iam_openid_connect_provider" "eks" {  url = aws_eks_cluster.prod-eks-cluster.identity[0].oidc[0].issuer  client_id_list = [    "sts.amazonaws.com"  ]  thumbprint_list = [data.tls_certificate.eks.certificates[0].sha1_fingerprint]}

In this block of code we created an OIDC provider using the issuer url that EKS Automatically creates for us. We specified sts as a client for this provider and gave a thumbprint_list which is basically a fingerprint used in digital signatures.

Installing AWS Application Load Balancer

In order to install ALB so we can expose the services we want to the internet and give it the ALB fault tolerance and availability we need, we'll install it's helm chart. But first we need to add a new provider to our terraform config.

provider "helm" {  kubernetes {    host                   = aws_eks_cluster.prod-eks-cluster.endpoint    cluster_ca_certificate = base64decode(aws_eks_cluster.prod-eks-cluster.certificate_authority[0].data)    exec {      api_version = "client.authentication.k8s.io/v1beta1"      args        = ["eks", "get-token", "--cluster-name", aws_eks_cluster.prod-eks-cluster.name]      command     = "aws"    }  }}

We install helm passing in the eks endpoint & certificate along with an exec block that assures that the helm provider always gets a valid token when doing anything related to our cluster.

Before proceeding to the actual chart installation, we'll need to create a role for ALB with a trust policy and permissions, just like we did at the start with EKS and node groups

# LB-role.tfresource "aws_iam_policy" "ingress" {  name        = "AWSLoadBalancerControllerIAMPolicy"  description = "AWS Load Balancer Controller IAM Policy"  policy    = jsonencode(  {    "Version": "2012-10-17",    "Statement": [        {            "Effect": "Allow",            "Action": [                "iam:CreateServiceLinkedRole"            ],            "Resource": "*",            "Condition": {                "StringEquals": {                    "iam:AWSServiceName": "elasticloadbalancing.amazonaws.com"                }            }        },        {            "Effect": "Allow",            "Action": [                "ec2:DescribeAccountAttributes",                "ec2:DescribeAddresses",                "ec2:DescribeAvailabilityZones",                "ec2:DescribeInternetGateways",                "ec2:DescribeVpcs",                "ec2:DescribeVpcPeeringConnections",                "ec2:DescribeSubnets",                "ec2:DescribeSecurityGroups",                "ec2:DescribeInstances",                "ec2:DescribeNetworkInterfaces",                "ec2:DescribeTags",                "ec2:GetCoipPoolUsage",                "ec2:DescribeCoipPools",                "elasticloadbalancing:DescribeLoadBalancers",                "elasticloadbalancing:DescribeLoadBalancerAttributes",                "elasticloadbalancing:DescribeListeners",                "elasticloadbalancing:DescribeListenerCertificates",                "elasticloadbalancing:DescribeSSLPolicies",                "elasticloadbalancing:DescribeRules",                "elasticloadbalancing:DescribeTargetGroups",                "elasticloadbalancing:DescribeTargetGroupAttributes",                "elasticloadbalancing:DescribeTargetHealth",                "elasticloadbalancing:DescribeTags"            ],            "Resource": "*"        },        {            "Effect": "Allow",            "Action": [                "cognito-idp:DescribeUserPoolClient",                "acm:ListCertificates",                "acm:DescribeCertificate",                "iam:ListServerCertificates",                "iam:GetServerCertificate",                "waf-regional:GetWebACL",                "waf-regional:GetWebACLForResource",                "waf-regional:AssociateWebACL",                "waf-regional:DisassociateWebACL",                "wafv2:GetWebACL",                "wafv2:GetWebACLForResource",                "wafv2:AssociateWebACL",                "wafv2:DisassociateWebACL",                "shield:GetSubscriptionState",                "shield:DescribeProtection",                "shield:CreateProtection",                "shield:DeleteProtection"            ],            "Resource": "*"        },        {            "Effect": "Allow",            "Action": [                "ec2:AuthorizeSecurityGroupIngress",                "ec2:RevokeSecurityGroupIngress"            ],            "Resource": "*"        },        {            "Effect": "Allow",            "Action": [                "ec2:CreateSecurityGroup"            ],            "Resource": "*"        },        {            "Effect": "Allow",            "Action": [                "ec2:CreateTags"            ],            "Resource": "arn:aws:ec2:*:*:security-group/*",            "Condition": {                "StringEquals": {                    "ec2:CreateAction": "CreateSecurityGroup"                },                "Null": {                    "aws:RequestTag/elbv2.k8s.aws/cluster": "false"                }            }        },        {            "Effect": "Allow",            "Action": [                "ec2:CreateTags",                "ec2:DeleteTags"            ],            "Resource": "arn:aws:ec2:*:*:security-group/*",            "Condition": {                "Null": {                    "aws:RequestTag/elbv2.k8s.aws/cluster": "true",                    "aws:ResourceTag/elbv2.k8s.aws/cluster": "false"                }            }        },        {            "Effect": "Allow",            "Action": [                "ec2:AuthorizeSecurityGroupIngress",                "ec2:RevokeSecurityGroupIngress",                "ec2:DeleteSecurityGroup"            ],            "Resource": "*",            "Condition": {                "Null": {                    "aws:ResourceTag/elbv2.k8s.aws/cluster": "false"                }            }        },        {            "Effect": "Allow",            "Action": [                "elasticloadbalancing:CreateLoadBalancer",                "elasticloadbalancing:CreateTargetGroup"            ],            "Resource": "*",            "Condition": {                "Null": {                    "aws:RequestTag/elbv2.k8s.aws/cluster": "false"                }            }        },        {            "Effect": "Allow",            "Action": [                "elasticloadbalancing:CreateListener",                "elasticloadbalancing:DeleteListener",                "elasticloadbalancing:CreateRule",                "elasticloadbalancing:DeleteRule"            ],            "Resource": "*"        },        {            "Effect": "Allow",            "Action": [                "elasticloadbalancing:AddTags",                "elasticloadbalancing:RemoveTags"            ],            "Resource": [                "arn:aws:elasticloadbalancing:*:*:targetgroup/*/*",                "arn:aws:elasticloadbalancing:*:*:loadbalancer/net/*/*",                "arn:aws:elasticloadbalancing:*:*:loadbalancer/app/*/*"            ],            "Condition": {                "Null": {                    "aws:RequestTag/elbv2.k8s.aws/cluster": "true",                    "aws:ResourceTag/elbv2.k8s.aws/cluster": "false"                }            }        },        {            "Effect": "Allow",            "Action": [                "elasticloadbalancing:AddTags",                "elasticloadbalancing:RemoveTags"            ],            "Resource": [                "arn:aws:elasticloadbalancing:*:*:listener/net/*/*/*",                "arn:aws:elasticloadbalancing:*:*:listener/app/*/*/*",                "arn:aws:elasticloadbalancing:*:*:listener-rule/net/*/*/*",                "arn:aws:elasticloadbalancing:*:*:listener-rule/app/*/*/*"            ]        },        {            "Effect": "Allow",            "Action": [                "elasticloadbalancing:AddTags"            ],            "Resource": [                "arn:aws:elasticloadbalancing:*:*:targetgroup/*/*",                "arn:aws:elasticloadbalancing:*:*:loadbalancer/net/*/*",                "arn:aws:elasticloadbalancing:*:*:loadbalancer/app/*/*"            ],            "Condition": {                "StringEquals": {                    "elasticloadbalancing:CreateAction": [                        "CreateTargetGroup",                        "CreateLoadBalancer"                    ]                },                "Null": {                    "aws:RequestTag/elbv2.k8s.aws/cluster": "false"                }            }        },        {            "Effect": "Allow",            "Action": [                "elasticloadbalancing:ModifyLoadBalancerAttributes",                "elasticloadbalancing:SetIpAddressType",                "elasticloadbalancing:SetSecurityGroups",                "elasticloadbalancing:SetSubnets",                "elasticloadbalancing:DeleteLoadBalancer",                "elasticloadbalancing:ModifyTargetGroup",                "elasticloadbalancing:ModifyTargetGroupAttributes",                "elasticloadbalancing:DeleteTargetGroup"            ],            "Resource": "*",            "Condition": {                "Null": {                    "aws:ResourceTag/elbv2.k8s.aws/cluster": "false"                }            }        },        {            "Effect": "Allow",            "Action": [                "elasticloadbalancing:RegisterTargets",                "elasticloadbalancing:DeregisterTargets"            ],            "Resource": "arn:aws:elasticloadbalancing:*:*:targetgroup/*/*"        },        {            "Effect": "Allow",            "Action": [                "elasticloadbalancing:SetWebAcl",                "elasticloadbalancing:ModifyListener",                "elasticloadbalancing:AddListenerCertificates",                "elasticloadbalancing:RemoveListenerCertificates",                "elasticloadbalancing:ModifyRule"            ],            "Resource": "*"        }    ]})}

We first created a policy from scratch, gave it a name and several statement blocks covering all the permissions required.

# LB-role.tfdata "aws_iam_policy_document" "ingress-iam" {  statement {    actions = ["sts:AssumeRoleWithWebIdentity"]    effect = "Allow"    condition {      test ="StringEquals"      variable = "${replace(aws_iam_openid_connect_provider.eks.url, "https://", "")}:sub"      values = ["system:serviceaccount:kube-system:aws-load-balancer-controller"]    }    principals {      type        = "Federated"      identifiers = [aws_iam_openid_connect_provider.eks.arn]    }  }}

Now this is our trust relationship document, allowing this time access to call the sts:AssumeRoleWithWebIdentity since we're using OIDC. With a principal of only the eks service and with a special condition of matching strings. It makes sure that the value system:serviceaccount:kube-system:aws-load-balancer-controller exists as a :sub in the Service Account token sent along with the request. sub means the principal in our case the service account of aws-load-balancer-controller in the kube-system namespace.

# LB-role.tfresource "aws_iam_role" "ingress-role" {  name = "test-ingress"  assume_role_policy = data.aws_iam_policy_document.ingress-iam.json}resource "aws_iam_role_policy_attachment" "ingress" {  role = aws_iam_role.ingress-role.name  policy_arn = aws_iam_policy.ingress.arn}

Now all we do is create the role, bind it with the trust relationship and the policy we created earlier.

Moving on now we create the helm chart for ALB

resource "helm_release" "aws_load_balancer_controller"{  name = "aws-load-balancer-controller"  repository = "https://aws.github.io/eks-charts"  chart = "aws-load-balancer-controller"  namespace = "kube-system"  set {    name = "replicaCount"    value = 1  }  set{    name = "clusterName"    value = aws_eks_cluster.prod-eks-cluster.name  }  set{    name="vpcId"    value = aws_vpc.eks-cluster-vpc.id  }  set{    name = "serviceAccount.name"    value= "aws-load-balancer-controller"  }  set{    name= "serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"    value = aws_iam_role.ingress-role.arn  }}

We create the helm chart and set a couple of configs that are important.

Cluster Name which will match our eks cluster name
VPC ID which is optional
Service Account name which is important and will be the name of our ALB Service account
Annotations on the service account describing that it can assume the IAM role we just created.

Now everything should be set up, apply the following and let terraform do its magic.

terraform initterraform apply --auto-approve

This usually takes 10 minutes or so to fully build.

Moving on you'll need to have kubectl installed to be able to apply all the service

Once finished, run the following to update your kube-config

aws eks --region eu-central-1 update-kubeconfig --name eks-cluster-production

Then set the default kubectl context to be your new EKS cluster

kubectl config view Then find the name of the new cluster
kubectl config set-context

Create a separate namespace for the services below

kubectl create namespace temp-calculator

Now you can apply all the services & deployments below using kubectl apply -f .yaml

api-deployment.yaml

apiVersion: v1kind: ServiceAccountmetadata:  name: aws-sa  namespace: temp-calculator  annotations:    eks.amazonaws.com/role-arn: arn:aws:iam::383567628073:role/test-oidc---apiVersion: apps/v1kind: Deploymentmetadata:  name: temperature-api  namespace: temp-calculatorspec:  replicas: 1  selector:    matchLabels:      app: temperature-api  template:    metadata:      labels:        app: temperature-api    spec:      serviceAccountName: aws-sa      containers:      - name: temperature-api        image: amrelhewy/temperature-api:2.0        resources:          limits:            memory: "90Mi"            cpu: "250m"        ports:        - containerPort: 3000---apiVersion: v1kind: Servicemetadata:  name: temperature-api  namespace: temp-calculatorspec:  selector:    app: temperature-api  ports:  - port: 3000    name: http

temp-service-deployment.yaml

apiVersion: apps/v1kind: Deploymentmetadata:  name: temperature-service  namespace: temp-calculatorspec:  replicas: 1  selector:    matchLabels:      app: temperature-service  template:    metadata:      labels:        app: temperature-service    spec:      containers:      - name: temperature-service        image: amrelhewy/temperature-service:2.0        imagePullPolicy: IfNotPresent        resources:          limits:            memory: "90Mi"            cpu: "250m"        ports:        - containerPort: 8080---apiVersion: v1kind: Servicemetadata:  name: temperature-service  namespace: temp-calculatorspec:  selector:    app: temperature-service  ports:  - port: 8080

Finally our ingress

ingress.yaml

apiVersion: networking.k8s.io/v1kind: Ingressmetadata:  name: k8s-alb  namespace: temp-calculator  labels:    name: k8s-alb  annotations:    alb.ingress.kubernetes.io/scheme: internet-facing    alb.ingress.kubernetes.io/target-type: ipspec:  ingressClassName: alb  rules:  - host:     http:      paths:      - pathType: Prefix        path: "/"        backend:          service:            name: temperature-api            port:              number: 3000

The Ingress uses the underlying ALB as the ingress controller, we annotate it with a couple of annotations as per AWS Documentation.

Finally if you have an existing domain, when you do kubectl get ingress -n temp-calculator you can see the Address for the ingress AWS provides us. All you need to do is add a CNAME record for your domain and point it to that value. If you have any questions feel free to reach me on Linkedin or Twitter! both linked in my profile

To Test the API just send a curl request as follows

curl -X GET "http://?from=Celsius&to=Kelvin&temperature=36"# response# {"value":309.15,"unit":"Kelvin"}

That wraps it up for April's article, if you have any questions make sure to reach out via Linkedin or Twitter (both in my bio). See you next month 🚀

Note: Make sure to do terraform destroy after you finish! as EKS costs 0.10$ per hour 😅

Resources

Deploying a simple web server using Terraform & AWS

Amr Elhewy — Fri, 24 Feb 2023 17:56:29 GMT

What's up, everybody! Today I'm going to be doing a quick & hopefully easy walkthrough on how to deploy a simple nginx web server on AWS using terraform. It will include a proper setup for our EC2 Instance including adding all the necessary network components.

What is terraform?

For complete beginners, Terraform is an open-source infrastructure-as-code software tool. It allows users to define and provision infrastructure resources, such as virtual machines, storage accounts, and network interfaces, in a declarative manner, using a domain-specific language (DSL).

With Terraform, users can define their infrastructure as a set of configuration files, called Terraform scripts, which describe the desired state of their infrastructure. Terraform then automatically creates or modifies the infrastructure to match the defined state, using APIs provided by the underlying infrastructure providers.

Terraform is a very very powerful tool in creating & modifying infrastructure whether it is to change a configuration, add a new instance or edit existing ones terraform makes it super easy to do so.

Head over to the official Terafform website to download and install from here

What will we be doing?

After installing terraform, we'll need to create an AWS account. If you don't have one already head over to AWS and create one.

We'll be going through doing the following:

Adding AWS as a provider for our terraform scripts.
We'll start creating the AWS resources necessary for getting everything ready.
Applying the resources and trying to access our web server!

So without further or do, let's get started.

Adding AWS as a provider

In terraform, we can use different providers to get a better understanding of what providers exactly are heading over here. For example; AWS & GCP are considered terraform providers.

Since we'll be using AWS, let's do the following:

Create an empty directory with a file named provider.tf
We're going to paste the following into the file
```
 terraform {   required_providers {     aws = {       source  = "hashicorp/aws"       version = "~> 4.0"     }   } } provider "aws" {   access_key = "access-key"   secret_key = "secret-key"   region = "eu-central-1" }
```
The first part adds AWS as our provider, The second one configures our AWS provider by giving it the access key, secret key & region. If you don't know how to get these in your AWS account follow the steps here.
Once you finished this step, in your terminal type terraform init and hit enter. It will start fetching the specified provider.

Creating the AWS resources

After that, we'll need to create our resources. We'll need to make sure we configure the network properly before deploying any EC2 instance.

What we'll be doing is kind of overkill and is just for learning purposes.

We'll be doing the following:

Creating an AWS VPC (Virtual Private Cloud)
Creating a public subnet inside our VPC
Adding an Internet Gateway for our VPC which will help give us public internet access from inside the VPC
Adding a custom route table for our subnet
Adding a Security Group for our VPC to control the network traffic
Adding a network interface for our to-be-created EC2 Instance
Providing the network interface with a public static IP (AWS Elastic IP)
Finally creating the AWS EC2 Instance inside the subnet created and installing nginx inside of it

To create our VPC & Subnet we'll create a file called network.tf and paste the following

resource "aws_vpc" "production-vpc" {  cidr_block = "10.0.0.0/16"}resource "aws_subnet" "production-subnet-1" {  vpc_id     = aws_vpc.production-vpc.id  cidr_block = "10.0.1.0/24"  availability_zone = "eu-central-1a"}

The first resource is our VPC, we'll give it the name production-vpc. We'll give it a cidr_block. Cidr stands for classless inter-domain routing which is a method of assigning IP addresses that improves the efficiency of address distribution and replaces the previous system based on Class A, Class B and Class C networks.

It's the VPC network IP range which means that the first 16 bits represent the network while the last 16 bits represent the hosts. it coressponds to a subnet mask of 255.255.0.0

The second block comes from our subnet. We give it our VPC ID, our availability zone (every region has several availability zones) and a cidr_block of its own to specify the subnet IP range. 10.0.1.0/24 corresponds to having the first 24 bits identify the network whilst the last 8 bits identify the host.

To create our Internet Gateway & Custom Route Table we'll create another file called routing.tf and paste the following;

resource "aws_internet_gateway" "production-ig" {  vpc_id = aws_vpc.production-vpc.id}resource "aws_route_table" "production-subnet-1-route-table" {  vpc_id = aws_vpc.production-vpc.id  route {    cidr_block = "0.0.0.0/0"    gateway_id = aws_internet_gateway.production-ig.id  }  route {    ipv6_cidr_block        = "::/0"    gateway_id = aws_internet_gateway.production-ig.id  }}resource "aws_route_table_association" "production-subnet-1-association-1" {  subnet_id      = aws_subnet.production-subnet-1.id  route_table_id = aws_route_table.production-subnet-1-route-table.id}

The first resource is our Internet Gateway. It takes in the VPC id.

The route table part is optional really. What route tables do is answer the question what is the next destination for these network packets? You can redirect packets to different networks by specifying the destination IP Address in the route table.

So in the first route block, we specify that any IP Address in the given cidr_block range gets routed to our internet gateway. 0.0.0.0/0 means any IP Address.

The second route block is the same but for IPV6. The first one was IPV4 Addresses

Now, After creating the route table, we need to tell AWS that we want our subnet to use this route table. This is done using route table associations where we specify the subnet id as well as the route table id.

To Create the security group we'll create a file called security-group.tf and paste the following

resource "aws_security_group" "production-security-group" {  name        = "allow_all"  description = "Allow All Traffic"  vpc_id      = aws_vpc.production-vpc.id  ingress {    description      = "HTTPS"    from_port        = 443    to_port          = 443    protocol         = "tcp"    cidr_blocks      = ["0.0.0.0/0"]  }  ingress {    description      = "HTTP"    from_port        = 80    to_port          = 80    protocol         = "tcp"    cidr_blocks      = ["0.0.0.0/0"]  }  egress {    from_port        = 0    to_port          = 0    protocol         = "-1"    cidr_blocks      = ["0.0.0.0/0"]    ipv6_cidr_blocks = ["::/0"]    # ALLOW ALL  }}

What we did was create a security group resource, and give it the VPC ID and a couple of ingress & egress blocks.

The ingress blocks are responsible for traffic coming inside the VPC. They control the inflow of traffic. Whilst the egress control traffic outflow

In the ingress blocks, we specify port-ranges 443 & 80 for HTTPS & HTTP Traffic along with cidr_blocks governing what IPS are allowed to pass through. Since it's a web server we gave it every possible IP Address. We also specify the protocol which is Transmission Control Protocol (TCP).

The egress block opens traffic outflow; It has no restriction on any outflow traffic

Now to create the EC2 Instance we'll create a new file server.tf and paste the following;

resource "aws_network_interface" "production-ec2-1-NI" {  subnet_id       = aws_subnet.production-subnet-1.id  private_ips     = ["10.0.1.50"]  security_groups = [aws_security_group.production-security-group.id]}resource "aws_eip" "production-eip" {  vpc                       = true  network_interface         =  aws_network_interface.production-ec2-1-NI.id  associate_with_private_ip = "10.0.1.50"}data "aws_ami" "ubuntu" {  most_recent = true  filter {    name   = "name"    values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]  }  filter {    name   = "virtualization-type"    values = ["hvm"]  }  owners = ["099720109477"] # Canonical}resource "aws_instance" "web" {  ami           = data.aws_ami.ubuntu.id  instance_type = "t2.micro"  availability_zone = "eu-central-1a"  network_interface {    network_interface_id = aws_network_interface.production-ec2-1-NI.id    device_index = 0  }  user_data = <<-EOF              #!bin/bash              sudo apt update -y              sudo apt install nginx -y              sudo systemctl start nginx              EOF}output "public-ip" {  value = aws_eip.production-eip.public_ip}

We start by creating the network interface giving it our subnet id & security group. Specifying a Private IP for it 10.0.1.50

After that, we create our Elastic IP that takes in our network interface-id, specifying that it exists inside a VPC & giving it the Private IP we specified.

Now all that's left is to create our AWS EC2 instance, The data part in the code above is called a terraform data source. They allow Terraform to use the information defined outside of Terraform, defined by another separate Terraform configuration, or modified by functions.

What it does is it fetches the AMI (Amazon Machine Image) that applies to the filters specified (ubuntu 20.04 with a virtualization type of "HVM"). More information on Virtualization types here.

Every EC2 instance needs an AMI, so we start by specifying the AMI we just got from the data source. We also specify the instance type as "t2.micro" which is just for experimentation as it's very limited resource-wise.

We follow up by adding the availability zone we want our machine to be in followed by the network interface we just created. Which will automatically put it inside the subnet created.

The device_index = 0 part means the first device in the network interface. (you could bind the network interface to different devices)

Finally, we add a user_data command which invokes the set of instructions specified when the machine boots up.

We install nginx & start the service

And we use the output directive to print out the Public IP of our EC2 Machine.

After finishing everything we can apply using terraform apply and it will show us all the changes that will get created asking us to approve them by typing "yes". After applying wait until the EC2 instance gets created (you can check from AWS Console) then visit the public IP and you should see the welcome to nginx page!

All the resources in the code are provided by terraform in their documentation. If you wanted, for example, a VPC resource just Google "terraform VPC resource" and it will probably be the first link on the page.

After finishing everything don't forget to destroy all the resources created! you can do that easily by typing in terraform destroy and everything will be destroyed.

Summary

Terraform is a very powerful tool, if we were to edit any of the resources we created we'd just edit it and apply the changes and it only would update the resource changed. I hope you got something out of this small tutorial & till the next one!

Resources

Service Mesh 101

Amr Elhewy — Sat, 11 Feb 2023 18:23:00 GMT

Hello everyone! In this article, I'm going to be talking about service mesh, what it is and giving a practical example of applying a service mesh on a demo project I made using Istio and we'll be applying it on a Kubernetes cluster. We'll also install the Kiali dashboard alongside the service mesh which will provide us with almost real-time monitoring of our service mesh and services.

What is a service mesh?

Service mesh is a unified solution to handling complex network communication between microservices, it is simply a dedicated infrastructure layer for managing service-to-service communication in a microservice architecture. It provides features such as traffic management, service discovery, load balancing, resilience, security, monitoring and more.

It is somewhat similar to the Kubernetes ingress in some ways but provides more out-of-the-box features than what the ingress offers, it gives a comprehensive solution for managing network traffic in a cluster. While it can provide ingress features its usability is more of a service-to-service internal cluster job.

How does a service mesh work?

Service mesh decouples any communication between services and itself. It isn't dependent on application logic it doesn't even need any code to be written in an application for it to work. What it uses is a single pod multi-container pattern called Sidecar which is simply a container other than the applications container that does work without the knowledge of the main application container. Enhances the application without it even knowing. It is deployed alongside the microservices themselves, providing a uniform way of communication, allowing each microservice to offload these concerns and focus on its business logic.

As we go through the demo I'll try to explain everything and hopefully, it all makes sense by the end of the article.

Istio

"Istio is an open-source service mesh that layers transparently onto existing distributed applications. Istios powerful features provide a uniform and more efficient way to secure, connect, and monitor services. Istio is the path to load balancing, service-to-service authentication, and monitoring with few or no service code changes". This is from the Istio documentation which can be found here.

How does it work?

As we mentioned before it uses the Sidecar pattern to deploy what is called an Envoy Proxy next to each application, What Istio calls the data plane. These proxies are dynamically programmed with the specified configurations by what is called a Control Plane which continuously updates any changes applied.

Demo

Now after this brief introduction, let's apply a service mesh to a demo project I made, the demo project has 2 services;

A simple Web server made with GoLang that takes a temperature and converts it from the specified units provided
A Temperature converter service that takes the payload and converts the temperature as required.

Both services communicate together via gRPC (because why not). The temperature service is a gRPC server.

I dockerized both services and pushed them to the Docker hub publicly if anyone wants to follow along. Let's get started

Before getting started I recommend you have a local Kubernetes cluster installed, I use Minikube.

Download & Install Istio

# Downloadcurl -L https://istio.io/downloadIstio | sh -cd istio-1.16.2export PATH=$PWD/bin:$PATH# Installistioctl install --set profile=default

What we just did was download the Istio script & execute it, then add it to our PATH env variable.

After that, we chose a default profile for the Istio setup. Istio comes with different profiles out of the box with different configurations, the default one is convenient to showcase the functionality of Istio. More info here.

Now once we have our local Kubernetes cluster up and running, we need to add a label for our namespace to instruct Istio to automatically inject the Envoy proxies we talked about into every service.

kubectl label namespace default istio-injection=enabled

To check there are no errors execute

istioctl analyze No validation issues found when analyzing namespace: default.

Now, we'll apply our Kubernetes deployment & service files to get the cluster up and running.

API Deployment & Service

apiVersion: apps/v1kind: Deploymentmetadata:  name: temperature-apispec:  replicas: 1  selector:    matchLabels:      app: temperature-api  template:    metadata:      labels:        app: temperature-api    spec:      containers:      - name: temperature-api        image: amrelhewy/service-mesh-temp-api:1.0        resources:          limits:            memory: "500Mi"            cpu: "500m"        ports:        - containerPort: 3000---apiVersion: v1kind: Servicemetadata:  name: temperature-apispec:  selector:    app: temperature-api  ports:  - port: 3000    name: http

Temperature Service Deployment & SVC

apiVersion: apps/v1kind: Deploymentmetadata:  name: temperature-servicespec:  replicas: 1  selector:    matchLabels:      app: temperature-service  template:    metadata:      labels:        app: temperature-service    spec:      containers:      - name: temperature-service        image: amrelhewy/service-mesh-temp-service:1.0        imagePullPolicy: IfNotPresent        resources:          limits:            memory: "128Mi"            cpu: "500m"        ports:        - containerPort: 8080---apiVersion: v1kind: Servicemetadata:  name: temperature-servicespec:  selector:    app: temperature-service  ports:  - port: 8080    name: http2-service

Paste them into 2 YAML files call them whatever you like & then apply them using

kubectl apply -f .yaml

Execute the following command and re-execute till the status is changed to running.

kubectl get pods

If you followed everything up till now, we should have both services up and running perfectly. Let's try a simple test to check if everything is working as expected or not.

Port Forward the API server using
```
 kubectl port-forward  3000:3000
```
Now the API should be accessible from localhost:3000!

Execute the following curl command

 curl "http://localhost:3000?from=Celsius&to=Kelvin&temperature=150" # Response {"value":423.15,"unit":"Kelvin"}

To continue, we'll need to understand 2 main components of Istio; virtual services and gateways.

Istio Virtual Service

In Istio, a virtual service is a configuration object that defines the rules for routing incoming requests to the appropriate backend services. Virtual services provide a way to decouple the microservices architecture of your application from the underlying network infrastructure and make it possible to manage the traffic routing centrally and consistently.

A virtual service in Istio specifies the host names, paths, and ports that incoming requests should match, as well as the backend services to which the requests should be directed. It can also define the weight of traffic to be sent to each backend service, allowing for easy canary deployments or traffic shifting for testing and rollouts.

A typical use case is to send traffic to different versions of a service, specified as service subsets. Clients send requests to the virtual service host as if it was a single entity, and Envoy then routes the traffic to the different versions depending on the virtual service rules: for example, 20% of calls go to the new version or calls from these users go to version 2. This allows you to, for instance, create a canary rollout where you gradually increase the percentage of traffic thats sent to a new service version. More info here

Istio Gateway

You use a gateway to manage inbound and outbound traffic for your mesh, letting you specify which traffic you want to enter or leave the mesh.

Gateways are primarily used to manage ingress traffic.

Follow these instructions to successfully add a Gateway along with a Virtual Service:

The gateway of Istio is backed by a Kubernetes load balancer type service, it exists in the namespace Istio-system, we'll need to export the following environment variables for later use:
```
 export INGRESS_NAME=istio-ingressgateway export INGRESS_NS=istio-system
```

Now execute the following to check if the load balancer service "Gateway" exists

kubectl get svc "$INGRESS_NAME" -n "$INGRESS_NS"

Now export the following env variables

export INGRESS_HOST=$(kubectl -n "$INGRESS_NS" get service "$INGRESS_NAME" -o jsonpath='{.status.loadBalancer.ingress[0].ip}')export INGRESS_PORT=$(kubectl -n "$INGRESS_NS" get service "$INGRESS_NAME" -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')

Now we need to apply a gateway configuration telling Istio what to exactly do.

apiVersion: networking.istio.io/v1alpha3kind: Gatewaymetadata:  name: api-gatewayspec:  selector:    istio: ingressgateway # use istio default controller  servers:  - port:      number: 80      name: http      protocol: HTTP    hosts:    - "*"---apiVersion: networking.istio.io/v1alpha3kind: VirtualServicemetadata:  name: temperature-apispec:  hosts:  - temperature-api  gateways:  - api-gateway  http:  - route:    - destination:        host: temperature-api

This tells the gateway to send any incoming traffic to the temperature api service. Note that this is a basic usage of the gateway and it offers much more advanced and complex features.

Now before checking if everything works, we'll need to execute a command to encapsulate our local network within the Kubernetes cluster network aka Tunneling.

minikube tunnel

Now we can export a GATEWAY_URL env variable before checking everything.

 export GATEWAY_URL=$INGRESS_HOST:$INGRESS_PORT

Now if we send a curl request to the previous end variable, we should get the same response.

curl -s "http://${GATEWAY_URL}?from=Celsius&to=Kelvin&temperature=150"# Response {"value":423.15,"unit":"Kelvin"}

Integrating with Kiali

Now we'd want to visualize our services and requests. We'll be installing the Kiali dashboard alongside Istio to do so. To install we'll start by installing Kiali in our cluster.

kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.17/samples/addons/kiali.yaml

To verify that it's running execute the following command and there should exist a service. (Kiali get's installed in the istio-system namespace)

kubectl -n istio-system get svc kiali

Kiali requires Prometheus as the monitoring and telemetry tool, Prometheus gathers all the data and Kiali displays it simply.

To install Prometheus to our cluster run the following

kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.17/samples/addons/prometheus.yaml

To open the dashboard, simply run

istioctl dashboard kiali

Now if you execute the curl command we previously did 4 or 5 times curl -s "http://${GATEWAY_URL}?from=Celsius&to=Kelvin&temperature=150" and go to the graph part we should see something that looks like this

Where this graph displays all our services (in our case we have 2) and the red line indicates that a request failed in the service connected to that line as we see on the right-hand side. It's a very very useful tool to display different requests flowing in a microservices architecture.

Wrapping up

Service mesh is a very interesting topic and I enjoyed making this article very much. Hope anyone got anything out of it and till the next time!

Resources

Tracing for better observability

Amr Elhewy — Sun, 08 Jan 2023 16:36:21 GMT

Hello everyone! For the first article for the new year, I wanted to talk about a really interesting topic that I find to be very useful in practice. But before we start we need to give some definitions that would help us throughout the article

What is observability in the first place?

It's simply having the ability to identify any state that your system got into without having to ship any code. It's completely different than traditional monitoring and metrics collection, observability is a deeper understanding. Traditional monitoring may and will be useful in some cases but on a large scale where you're searching for a needle in a haystack it won't be that beneficial for you. So definitely having a deep understanding of every system state is very important to solve complex problems

What are the benefits of observability?

The most important thing observability provides is instead of the senior engineer in a certain team being the most knowledgeable one and being the one that has faced most issues in a system, it becomes the most curious member of the team and allows flexibility in the part that no matter how long a certain team member has been on the team, with the right observability he can jump right into solving any issue.

So the oldest team member doesn't have to always be the one that has to get paged or bothered whenever a certain issue occurs.

Now before diving into tracing, we need to talk about events in a system.

Events

An event is nothing more than just a step to accomplish the final result, for example in a server request where the request has to talk to several external services before returning a response, each service is considered an event in the request cycle. It's a matter of seeing how many things a computation needs to do independently before returning the final result.

With the use of logging, you can add all the relevant information you'd need for a specific event.

Tracing

It's the process of stitching all events together where you can have a complete outlook on your request seeing different information related to different parts of your code.

When you gather all logs for all the different events that happened for a particular request, you can then aggregate these event logs and slice and dice different dimensions until you find what you're looking for.

In the visualization world, a trace consists of several trace spans or spans for abbreviation, each span represents an event or in other words something happening.

The root span has no parent. However, spans can have several children and can create a sort of hierarchy as we see in the figure below the root span has 1 child and that child itself has 3 different descendants from it. The figure below represents a waterfall visualization which is a visualization technique showing the start and finish of every span in the request

Implementing tracing without any library

I'll show a quick demo on implementing the idea or concept of tracing using Go below. However, there are a lot of open-source libraries that provide out-of-the-box implementations for such things. One I would recommend checking out is OpenTelemetry which has brilliant documentation and is compatible with many programming languages.

Let's say we have an endpoint in our application and we issue a GET request to it, the request then proceeds to make two calls to other services to retrieve data based on the payload of the request and then returns the results.

func rootHandler(r *http.Request, w http.ResponseWriter){    authorized := callAuthService(r)    name := callNameService(r)    if authorized{        w.Write([]byte("Success"))    }else{        w.Write([]byte("Unauthorized"))    }}

Now tracing should follow this request as it traverses multiple services. So we should expect to see a minimum of three spans at the length of this request

The originating request's span which is the root span in our case
The call to the authorization service which is a child of the root span
The call to the name service which also is a child of the root span

For each trace, we'll need to generate a unique trace ID for each single request so that we can forward the trace ID among the spans and group them back together once the request finishes. We'll use UUID for this case. Also let's assume our backend already sends the trace to a backend store where it undergoes processing of the data collected to be able to use it afterwards. We're only focusing about our endpoint's instrumentation here.

func rootHandler(r *http.Request, w http.ResponseWriter){    traceData := make(map[string]interface{})    traceData["trace_id"] = uuid.String()    traceData["span_id"] = uuid.String()    // main work of request}

We'll use a hashmap that propagates through different spans of the request and gets populated accordingly, However we will also need to know the duration of each span and possibly add some extra information for each span specifically if we wanted to.

We'll need to capture a timestamp for our spans start and finish and get the difference between them.

func rootHandler(r *http.Request, w http.ResponseWriter){    // trace data setup from above    startTime := time.Now()    traceData["timestamp"] = startTime.Unix()    // main work of request    traceData["duration_ms"] = time.Now().Sub(startTime)    traceData["service_name"] = "rootHandler"    sendSpan(traceData) // sends data to backend for processing}

We can also add custom fields as i mentioned above, so maybe adding the service name as well as what exactly the current span is doing would be beneficial.

Then finally for each span after adding everything needed for observability, we send the traceData to be processed in the backend where it gets grouped with every other span from the same trace ID specified.

How do we relate and connect all the services together so that they belong to the same trace?

The implementation of such thing would differ from system to system, for example if its just one monolith we can pass that information to each service we call. In distributed systems if we made an HTTP request for example we can set any named headers (as long as we know the name on both sides we should be fine) and send the trace id along with the parent span id too.

Typically HTTP headers follow a particular standard, such as (W3C or B3), using the B3 standard the specified headers would be X-B3-TraceId and X-B3-ParentSpanId.

So if we adjusted our for example callAuthService method we can send the hashmap traceData to it, extract the trace id and use the span id as the parent span id and send it as headers before actually calling the external service.

func callAuthService(r *http.Request, traceData     map[string]interface{}){    aReq,_ = http.NewRequest("GET", "http://auth/check", nil)    aReq.Header.Set("X-B3-TraceId", traceData["trace_id"])    aReq.Header.Set("X-B3-ParentSpanId", traceData["span_id"])    // make the request}

The same thing goes with the callNameService function as well. Then each service sends its generated spans to a tracing backend that processes these spans and stitches them together to create a for example water-fall type visualization that we want to see.

Summary

Events are the building blocks of observability and traces are interrelated series of events. Stitching all those events together is very useful for observability. This was just a brief on a really interesting topic I found and wanted to share. If you reached here thanks for your time and till the next one!

Resources

Observability Engineering [Book] - O'Reilly

A Dynamic Programming Beginner Guide

Amr Elhewy — Sat, 10 Dec 2022 22:59:36 GMT

Hey folks! Well if you clicked on this article then I'm guessing that you might have some struggles with dynamic programming or you might understand it but not get the full picture.

In this article, I'll try my best to cover how I understood dynamic programming and some tips and tricks I've learned from surfing different blog posts and youtube videos. So without further or do let's get started.

Just a note that everything written in this article is based on my experience and how I got a grip on dynamic programming. It might be different for you depending on how much you understand it already I'm by no means a coach or anything like that I just thought I'd share what I've learned to maybe help make it easier for beginners to get the grip faster.

Hang on there. What even is Dynamic Programming?

Before understanding dynamic programming we need to understand how we naively approach any problem. I'll be taking examples from Leetcode and linking them in this article and explaining step by step my thought process.

Let's for example take a simple dynamic programming category problem which is Min Cost Climbing Stairs, I'd encourage you to read the problem before continuing.

A simple approach to calculating the minimum given that you can either climb one or two steps when paying the cost (again I'd encourage you to read the problem before continuing) is to simply calculate all the combinations of all possible alterations you can make to reach the top. Calculating all possible combinations is simply referred to as The Brute Force Approach which does get you the right answer, however, performance wise it won't get accepted on Leetcode or probably in any interview you do. Let's explain why brute force is not optimal below;

The brute force approach for the problem mentioned is going to be recursive since we need to get all different combinations; for example, if I start at index 0 I can go to either index 1 or index 2, and when reaching index 1 I can go to either index 2 or index 3 and so on. So to calculate all possible calculations we need to consider all the cases and execute all cases recursively. An approach like this can best be explained with what is called a decision tree.

Let's work with an example of the following

cost = [10,15,20]

Since the problem states that you can start from index 0 or 1, we calculate all the possible combinations starting from index 0 and starting from index 1 as shown in the decision tree above.

When we reach index 3 we are done since the last element of the array has an index of 2 and by reaching 3 that means we have arrived at the top floor.

Why is this solution not optimal & Dynamic Programming intro

This solution is not optimal because if we look closely at the decision tree, we'll realize that there is a lot of repeated work going on, but what does repeated work even mean? let's analyze.

When we start at index 0 we can either reach index 1 or index 2 right? then we recursively call the subproblem which either starts from index 1 or index 2. Well if we examine it carefully we'll realize that one subproblem which is when we start at index 2 will get executed twice, Since from index 1 we have a subproblem that either goes to index 2 or index 3 as well as having an option to directly go to index 2 from index 0. So the subproblem starting at the stair of index 2 will get executed again for no reason since we should have already calculated it from the first execution. This is usually called repeated work and here is where Dynamic Programming comes in handy. What Dynamic Programming does is it will cache the execution of the index 2 subproblem and whenever we encounter a subproblem that starts from index 2, we'll just return the cached value instead of executing it all over again for it to return the same value. Effectively reducing the time complexity from O(2**n) to O(n).

Dynamic Programming effectively works on caching repeated work and this would greatly optimize a problem like this since we only have to calculate each step once and return the cached value whenever we land on some common step later on.

Dynamic Programming has different techniques and is a deep topic, this article is just a simple guide on how to make your brain think effectively when encountering a DP Problem. Always try to look out for repeated work and cache it effectively.

Dynamic Programming Techniques

There are several ways of solving DP Problems, But two of the most used are either Top Down or Bottom Up approaches and we'll discuss the difference between them and how to convert from top-down to bottom-up.

Top Down Technique

This is usually the recursive technique which is easier to come up with in most questions, here's how to think top-down style;

1- Apply the brute force solution recursively

2- Try to locate any repeated work usually by drawing a decision tree or thinking about the problem carefully

3- Try to understand what should be cached and how effectively would the cached parameter(s) reduce execution time.

4- Cache and you're good to go.

Let's do these steps one by one with the given Leetcode example.

Apply the brute force solution recursively

    def minCostClimbingStairs(self, cost: List[int]) -> int:        ln = len(cost)        def dfs(index):            if index >= ln:                return 0            return cost[index] + min(dfs(index+1), dfs(index + 2))        return min(dfs(0), dfs(1))

This is the recursive brute force approach, a function that takes the index we're at as a parameter and our base case is whenever we reach the end of the array which is the top we return 0, the return value represents the cost for the subproblem that starts from the specified index.

The complexity for this approach would be (2**n) where 2 is the number of decisions we make per one subproblem (either go up by 1 index or 2) and we do this till we reach the end of the array which is of length n.

Try to locate any repeated work usually by drawing a decision tree or thinking about the problem carefully

Here as we discussed before the repeated work is whenever we have a start index that has already been calculated. So our cache parameter will be the index we're at so whenever we visit this index again we return our cached value instead of re-execution.

    def minCostClimbingStairs(self, cost: List[int]) -> int:        ln = len(cost)        dp = {}        def dfs(index):            if index >= ln:                return 0            if index in dp:                return dp[index]            ans = cost[index] + min(dfs(index+1), dfs(index + 2))            dp[index] = ans            return dp[index]        return min(dfs(0), dfs(1))

The cache is usually a hashmap of some sort for recursive solutions, we just cache the calculated value and whenever the value exists in the hashmap we return it immediately.

The time complexity for this solution goes down to O(n) which is a linear approach.

All good but can we do better?

Usually, recursive solutions are more readable and they're fine. However, there are more optimal solutions regarding memory usage. Recursion usually uses up stack memory due to the number of function calls made which can be a bit overhead. A solution to this is another approach called the bottom-up which is not recursive but iterative as we'll see below.

Bottom Up Technique

The bottom-up technique is usually an iterative approach where we get rid of the stack memory used by recursion and that can provide better optimization for our problem.

Converting from Top Down to Bottom Up might be tricky at first. But when you understand and practice trying it things will get a lot easier.

Bottom-up approaches are usually a 1D or 2D array depending on the problem, The array (we'll call it DP) is usually the cache we're talking about. But instead of recursively going all the way down the decision tree and then back up, we start from the bottom of the tree and work our way up since we need the answers to the subproblems down there first anyways.

    def minCostClimbingStairs(self, cost: List[int]) -> int:        dp = [0] * (len(cost) + 2)        for i in range(len(cost) - 1,-1,-1):            dp[i] = cost[i] + min(dp[i+1], dp[i+2])        return min(dp[0], dp[1])

Every index in the array DP represents the optimal cost from starting at that index till reaching the top.

Remember our base case when we said if we exceed the length of the array we're done? that's why we added 2 to the length of the DP array. Because we're going from the bottom and working our way backward, starting from the last index what is the minimum value to reach the top? it's its own cost only. The + 2 added accounts for an out-of-range error that might happen when we're at the last index. since we can either go up by 1 or 2 and the last index + either 1 or 2 exceeds the length of our costs array.

We loop in reverse since it's a bottom-up approach starting from the base case and making the way to the first stair. Notice how the following two lines are similar

# Top Downans = cost[index] + min(dfs(index+1), dfs(index + 2))# Bottom Updp[i] = cost[i] + min(dp[i+1], dp[i+2])

This is called the recurrence relation. In simpler terms what is the relation of all the subproblems together? all subproblems work together to solve the bigger problem so what is the relation that connects them all together?

Notice how we need to calculate higher index numbers to be able to calculate the index we're standing at right now? For example, if we are at index 1 we need to calculate the subproblems of index 2 and 3 so we can have an answer for the index 1 subproblem? That's why bottom-up is more efficient since it calculates them first and moves backward so whenever we reach index 1 we already calculated indexes 2 and 3 and we're ready to just return the cached value. While in the Top Down approach we need to go depth-first style into the bottom of the tree to be able to calculate index 2 then do another depth-first to either calculate index 3 from scratch or return a cached value if found.

Finally, we return the minimum of either the first or second values in the array DP since we can either start from index 0 or index 1.

Summary and some random tips

I hope you understood or benefited anything from this simple beginner article explaining the benefits of dynamic programming.

One tip I would recommend is just to practice. It will never be that clear at first and it needs consistency to be able to train your brain how to think and handle different dynamic programming problems. I would recommend heading over to Leetcode and going through the easy problems first then working your way to much harder ones. There are a lot of different types of Dynamic Programming problem styles and I recommend this Article that goes in-depth into Dynamic Programming concepts and different problem styles.

Also, I would recommend watching NeetCode's DP course. Not only that but he also has a youtube solution for most Leetcode DP Problems. This guy is a living legend and he has helped me so much I can't recommend him enough!

Always try to find the recurrence relation, catch any repeated work and try to optimize.

Thanks and till the next one guys!

Rate limiter algorithms

Amr Elhewy — Sat, 03 Dec 2022 14:02:58 GMT

Hey everyone! today I'm going to be talking about rate limiters and different algorithms used to achieve rate limiting, so let's get started!

What is a rate limiter?

A rate limiter puts a limit on the number of requests a server fulfills, it throttles requests that cross a predefined limit, for example, if your service has a rate limit of 500 requests per minute it will block further requests if exceeded that predefined amount.

Rate limiter use cases

Rate limiters can be used in various use cases including the following:

Preventing resource starvation; where sometimes a software error might cause some sort of denial of service attack which is like shooting yourself in the foot and which may cause resource starvation (i.e too many requests and not enough resources to handle them)
Managing policies and quotas; If for example, your API has a free trial amount of calls then you would want to rate limit to that amount.
Controlling data flow; Where it can act as a control valve or regulate the pace of the requests as well as control their distribution among several servers, etc
Avoiding large costs; where rate limiting can prevent having expensive bills, especially when using cloud services.

Algorithms for rate limiting

There are various algorithms for rate limiting, each with advantages and disadvantages;

Token bucket algorithm
Leaking bucket algorithm
Fixed window counter algorithm
Sliding window log algorithm
Sliding window counter algorithm

Token Bucket Algorithm

This algorithm uses the analogy of a bucket with a predefined capacity of tokens, the bucket is continuously being filled with tokens at a constant rate and each time a request comes in, we check if the bucket has an available token, if so we serve the request otherwise rate limiting occurs and we have to wait till the bucket gets filled with a token.

Our parameters consist of:

Bucket capacity; the number of tokens that reside in the bucket
Rate Limit; the number of requests we want to limit per unit of time
Refill rate; which is 1/RLimit and it tells us how often a token gets added to the bucket
Requests count; which is the number of requests flowing in per unit of time

From the above algorithm we can observe the following:

The algorithm can cause a burst in requests if the bucket is filled with tokens that may or may not be beneficial to your business.
Choosing optimal values for a task like this is difficult, it requires careful design and several iterations to reach the optimal values.
The algorithm is space efficient as there are only 4 different parameters that we manipulate

Leaking Bucket Algorithm

This is a variant of the token bucket algorithm but instead of using tokens, it uses a bucket that contains all the incoming requests and processes them at a constant outgoing rate (avoids bursting requests mainly), it uses the analogy of a water bucket leaking at a constant rate where the water represents the requests and leaking is them being processed one after the other in a Queue (First in first out) manner

Our parameters consist of:

Bucket capacity; the maximum capacity of a bucket or the maximum number of requests that can be present in the bucket at a given time
Inflow rate; the rate at which the requests flow into the bucket which may vary and depend on different factors.
Outflow rate; the rate at which the requests get processed

From the above algorithm we can observe the following:

We completely prevent the bursting of request processing due to processing them at a constant outflow rate
Choosing optimal values for a task like this is difficult, it requires careful design and several iterations to reach the optimal values.
A burst of requests can fill the bucket, and if not processed in the specified time, recent requests can take a hit. This may affect the response time of recent requests which might a big problem in our design.

Fixed Window Counter Algorithm

This algorithm divides time into fixed intervals which are called windows and each window has a specific counter that only counts requests belonging to that window, every time a request is made the counter is incremented and requests get dropped when the counter reaches its predefined limit

As seen in the picture below, the dotted line represents the counter limit where requests get discarded after.

There is a significant problem with this algorithm. A burst of traffic greater than the allowed requests can occur at the edges of the window. In the below figure, the system allows a maximum of ten requests per minute. However, the number of requests in the one-minute window from 01:30 to 02:30 is 20, which is greater than the allowed number of requests.

A consistent burst of traffic (twice the number of allowed requests per window) at the window edges could cause a potential decrease in performance.

Our parameters consist of:

Window size; which represents the size of the window which can be in minutes, hours or even days
Rate limit; which represents the number of requests allowed per a time window
Requests count; which is the total number of requests coming in per time window

From the above algorithm we can observe the following:

A consistent burst of traffic (twice the number of allowed requests per window) at the window edges could cause a potential decrease in performance.
It is space efficient due to constraints on the rate of requests.

Sliding window log algorithm

The sliding window log algorithm keeps track of each incoming request. When a request arrives, its arrival time is stored in a hash map, usually known as the log. The logs are sorted based on the time stamps of incoming requests. The requests are allowed depending on the size of the log and arrival time.

The main advantage of this algorithm is that it doesnt suffer from the edge conditions, as compared to the fixed window counter algorithm.

Let's say a request comes in at 1:00 and the rate is limited to 2 requests per time window, the window will start from 1:00. Another request comes in at 1:45 and it gets served since its less than the limit. Once another one comes in at for example 1:50 it would get rejected as it crossed the rate limit.

After that a request comes in at 2:00, another completely new window starts from here and the same process happens all over again and the old data gets removed from the hashmap.

This approach is more dynamic than the fixed window counter algorithm.

Our parameters consist of:

Log size; which is the number of requests allowed per a specific time frame
Arrival Time; this parameter tracks the incoming requests time stamps' and tracks their count
Time Frame; which is the dynamic time frame that has requests in.

From the above algorithm we can observe the following:

This algorithm doesn't suffer the performance degradation of the fixed window counter due to the window edge problem
It consumes extra memory due to having a hash map and tracking all the requests' timestamps and so on.

Sliding window counter algorithm

Unlike the previously fixed window algorithm, the sliding window counter algorithm doesnt limit the requests based on fixed time units. This algorithm takes into account both the fixed window counter and sliding window log algorithms to make the flow of requests more smooth. Lets look at the flow of the algorithm in the below figure.

It gets applied using a mathematical formula that takes into consideration the previous window, the time frame and the overlap which is basically how deep are we into the current time window. I won't go into much depth into this algorithm because things might get complicated but it uses the number of requests basically in the previous time window and the current one in a mathematical formula to calculate a certain value, then this value gets checked against the rate limit of requests (let's say 100 requests per time window) and if it's less than it we can process the request.

Our parameters consist of:

Rate limit; It determines the number of maximum requests allowed per window.
Size of the window; This parameter represents the size of a time window that can be a minute, an hour, or any time slice.
The number of requests in the previous window; determines the total number of requests that have been received in the previous time window.
The number of requests in the current window; represents the number of requests received in the current window.
Overlap time; This parameter shows the overlapping time of the rolling window with the current window.

From the above algorithm we can observe the following:

The algorithm is also space efficient due to limited states: the number of requests in the current window, the number of requests in the previous window, the overlapping percentage, and so on.
It smooths out the bursts of requests and processes them with an approximate average rate based on the previous window.

Summary

There are different algorithms for rate limiting and the key to selecting the right one for you is to answer the question "what do I want to do?". Having an answer to this question will give you a great head start in choosing the algorithm you want to implement.

What's the difference between a column oriented database and a column-family store?

Amr Elhewy — Sat, 20 Aug 2022 01:11:44 GMT

Hello everyone. Today I'm going to briefly explain the difference between column oriented databases (e.g Vertica, MonetDB, Clickhouse) and column-family stores (e.g Cassandra) because throughout many system design courses I've seen and also throughout the information on the internet, there's a lot of misconception about the difference and I see many places refer to Cassandra as a column oriented database which is not correct. Let's start

Column Oriented Databases

They are databases that store data differently, they physically store the column values one after the other on disk unlike for example row oriented database (PostgreSQL for example) which physically stores the one whole row after the other on disk. This can be beneficial in data aggregations and pulling whole columns without an IO overhead but comes with the disadvantage of a bad performance when using for example select * which has to access every single file for each column to fetch the whole row.

This is a brief example of how data is stored in column oriented databases

  ID         Last    First   Salary  1          Doe     John    8000  2          Smith   Jane    4000  3          Beck    Sam     10001,2,3;Doe,Smith,Beck;John,Jane,Sam;8000,4000,1000;

This is a traditional relational data model and so column values are physically stored next to each other.

Column Family Store

It's considered part of NoSQL, Column Family databases store parts of a data entity or row in separate column-families, and has the ability to access these column-families separately. This means that not all parts of a row are picked up in a single I/O operation from storage, which is considered a good thing if only a subset of a row is relevant for a particular query.

Column oriented databases store columns from a traditional relational database table separately so that they can be accessed independently. Like column families, this is useful for queries that only access a subset of table attributes in any particular query. However, the main difference is that every column is stored separately, instead of families of columns.

Column Family Databases use a multi-dimensional map (something along the lines of a sparse, distributed, persistent multi-dimensional sorted map), Typically a row-name, column-name, and timestamp are sufficient to uniquely map to a value in the database. As we saw above how column oriented databases roughly get stored on disk, column family databases as I said use a sparse model (different rows can have a very different set of columns defined), Hence these systems will explicitly have column-name/value pairs for each element in a row within a column-family, or row-name/value pairs for each element within a single column-familyAssume we have the following column family called Fruits

apple -> colour  weight  price variety         "red"   100     40    "Cox"orange -> colour    weight  price  origin          "orange"  120     50     "Spain"

The row key here is the fruit type and each column name maps to a value, however as we see apple doesn't have an origin column. That's the beauty of sparse models, you can have any column you want in a specified row and not have it in the row after. It's like if it was a relational database but with MANY NULL columns but it doesn't physically store null values here it omits them saving storage space.

Column Family Databases generally does better for individual row queries, and does not perform well on aggregation-heavy workloads. Much of the reason for this difference can be explained in the pure column vs column-family difference between the systems. They can put attributes that tend to be co-accessed in the same column-family; this saves the seek cost that results from column-stores needing to find different attributes from the same row in many different places.

Summary

Cassandra can be named as a partitioned row-store which stores rows in column families partitioned on several nodes. While column oriented databases are a completely different concept. Hope you got an idea of the misconception and everything got even a bit clearer for you. That was it for this article see you next time!

References

How things work#2 - Designing a web server from scratch

Amr Elhewy — Thu, 04 Aug 2022 15:59:41 GMT

Hello guys I'm back with part 2 from the series how things work, if you didn't already check out part 1 from here

To start things off we'll list some functional and non functional requirements from our server

Functional Requirements

Server should be able to accept connections
Server should be able to return the appropriate response
For the case of simplicity we'll return the process id of the process that responds to the request. That brings us to another question; How will we design the server?

We'll use a hybrid approach (again I encourage you to check out the previous blog linked above to understand correctly what I'm about to do)A mix between the Reactor pattern and Pre-Forking. Nginx uses this approach in its web server. It allows it to scale to handle millions of concurrent connections

Non Functional Requirements

The server should be able to handle lots of connections with a fast response.

Creating the bare bones

Let's start off by creating the bare bones of our design. Let's create a file named server.rb which will have all our main server logic inside

The Preforking process is going to be before we dive in the reactor part, the workflow is as follows:

Main server process creates a listening socket.
Main server process forks a configurable number of child processes. [ here is where pre-forking actually happens]
Each child process accepts connections on the shared socket and handles them independently.
Main server process keeps an eye on the child processes.
The kernel actually load balances the incoming connections to the server socket across all the processes that listen on it. They inherit the file descriptors from the parent process hence they all have the same listening socket.
The main server won't accept any connections; the forked processes will.

class PreForking  CRLF = "/n"  def initialize(port = 3000)    @socket = TCPServer.new(port)  end  def respond(message)    @client.write(message)    @client.write(CRLF)  end  def gets    @client.gets(CRLF)  end

First we add a class called PreForking with a constructor that has a parameter port which is 3000 by default, Every new instance of this class will be a different server, Then 2 methods; respond which takes a message and writes it back to the client then writes the delimiter which is defined as CRLF, This is like telling the server when reading data where to stop reading so for example "I love pizza/nhelloworld" will be split into "I love pizza" and "hello world" because when we read we read streams of data so we need to have some agreed upon value (CRLF) to know where to stop.let's carry on

The at the end of each snippet indicates that the class or scope didn't finish yet; I think personally this approach is better than just adding a big chunk of code

  CONCURRENCY = 4  def run    child_pids = []    CONCURRENCY.times do      child_pids << spawn_child    end    trap(:INT) {      child_pids.each do |cpid|        begin          Process.kill(:INT, cpid)        rescue Errno::ESRCH        end      end      exit    }    loop do      pid = Process.wait      $stderr.puts "Process #{pid} quit unexpectedly"      child_pids.delete(pid)      child_pids << spawn_child    end  end

The snippet above simply does 2 things:

Spawn child process
Monitor the child process and if any die respawn them.

We defined a global variable CONCURRENCY of value 4; this indicates that there will be 4 pre-forked processes, so the main server is going to have 4 kids basically.

Then we head to the run method; this method starts spawning child processes via the spawn child method(next part) and adds their ids in an array. It also forwards any INTERRUPT signal to the children and exits, so if you invoke the run method via terminal and press CTRL + C which sends an INTERRUPT signal to the main server, it will forward that signal to kill all of its children.

Finally It enters an infinite loop where it's always blocking and specifically always blocked on the first line with Process.wait being a blocking operation. What it does is basically Is whenever a child process dies it returns the dead child's id. This helps us in knowing if any of the child processes died. Then in prints to stderr that the process died and proceeds to create a new one.

  def spawn_child    fork do      loop do        @client = @socket.accept        respond Process.pid        loop do          request = gets          if request            respond Process.pid          else            @client.close            break          end        end      end    end  end

The final method in the preforking puzzle is spawn_child; this method forks a process from the main process and goes into an infinite loop, it blocks at socket.accept waiting for a client to connect and once a client connects it returns the clients socket into the client variable, then immediately responds with the process id of the process handling the connection. After that It goes into another loop specifically related to the connection it currently has where it listens for data coming from this client and if the client decides to close the connection (sends a EOF) we close the connection and break out of this loop to start accepting new connections again.

This is all what's necessary to prefork and start the server. But there are a couple of things that we can do better

For each request the child blocks waiting for data from one connection first then after that connection ends the child then proceeds to start accepting new connections, so currently with out design we can have a max of 4 concurrent requests where each process handles a request. Here's where the reactor pattern comes in clutch. Using this pattern we're going to be able to utilize every pre-forked process to the fullest.Each process will be able to listen to not only an ongoing request, but also new connections as well as other accepted connections waiting to write on. Let's see what we can do here

Each pre-forked server will do the following:

The server monitors the listening socket for incoming connections.
Upon receiving a new connection it adds it to the list of sockets to monitor.
The server now monitors the active connection as well as the listening socket.
Upon being notified that the active connection is readable the server reads a chunk of data from that connection and dispatches the relevant callback.
Upon being notified that the active connection is still readable the server reads another chunk and dispatches the callback again.
The server receives another new connection; it adds that to the list of sockets to monitor.
The server is notified that the first connection is ready for writing, so the response is written out on that connection.

We'll add a new class called Connection; this will help us in making each and every new connection separated with it's own methods and variables

class Connection      CRLF = "\n"      attr_reader :client      def initialize(io)        @client = io        @request, @response = "", ""      end      def on_data(data)        @request << data        if @request.end_with?(CRLF)          # Request is completed.          respond Process.pid          @request = ""        end      end      def respond(message)        @response << message + CRLF        # Write what can be written immediately,        # the rest will be retried next time time        on_writable      end      def on_writable        bytes = client.write_nonblock(@response)        @response.slice!(0, bytes)      end      def monitor_for_reading?        true      end      def monitor_for_writing?        !(@response.empty?)      end    end

I'll try my best to explain this next part, we'll update our spawn_child method to be as follows:

  def spawn_child    fork do      @handles = {}      loop do        to_read = @handles.values.select(&:monitor_for_reading?).map(&:client)        to_write = @handles.values.select(&:monitor_for_writing?).map(&:client)        readables, writables = IO.select(to_read + [@socket], to_write)         readables.each do |socket|            if socket = @socket              io = @socket.accept              connection = Connection.new(io)              @handles[io.fileno] = connection            else              connection = @handles[socket.fileno]            begin              data = socket.read_nonblock(CHUNK_SIZE)              connection.on_data(data)            rescue Errno::EAGAIN            rescue EOFError              @handles.delete(socket.fileno)           end        end      end    end  end

What we do in the function above is the following:We have a local variable handles that has all the connections related to this forked process, we loop over them and check which ones do we need to monitor for reading and writing. Then we proceed to use select(2) which monitors sockets for reading and writing. Nowadays sys calls like epoll(7) and poll(2) are used because they can handle a much bigger amount of sockets than select.

Readables

Select will block until either one socket is available for reading or writing, whenever a socket is available for reading, we check if it's the servers main socket or a client socket. Since the server uses .accept to accept connections it counts as being readable in select(2). If the socket is the server's one we accept and instantiate a new connection object corresponding to the new client connection, and add the socket file descriptor number as a key to the hash handles we created while having the connection as the value.If it's not the main servers socket then it has to be a client socket ready to read from, reading blocks when there is no data being sent by the client but here we proceed to read_nonblock which basically never blocks, what it does is read a chunk of data according to CHUNK_SIZE specified, if it blocks then it'll just fall through the loop nothing special happens (Errno::EAGAIN) is the exception raised when blocked. If it didn't block then it invokes the on_data call sending it the data received to append to the request instance variable and check if it's a full request or all the data haven't been sent yet. If the client closes the connection EOFError we delete the entry from the hash as if the request never happened.

Writeables

Whenever a socket is available and ready to be written to, we just get it from the hash handles we defined and invoke the on_writeable function which basically writes nonblock from the request instance variable, and if it blocks it will slice what it sent from the request variable and fall through waiting to be ready again.

Finalizing

What we did in this article was combine 2 different web server design architectures into a hybrid model which enhances our scalability and allows more requests to come through. This idea was mentioned in the book Working with ruby but I wanted to try code it to see how everything would end up together looking like. I highly recommend this book even if you don't know ruby you'll benefit a lot. If you made it to here thanks for reading this blog and hope you learned something today, till next time!

How things work#1 - Understanding web servers and sockets

Amr Elhewy — Mon, 25 Jul 2022 14:31:44 GMT

In the past couple of months I've been digging down to how everything started, we live in a period now where everything is abstracted and sometimes people forget to understand really how it all started.In this article I'll be showing you the very basics of how a server is created, how it accepts connections and also I'll be discussing the most popular server design patterns so without further or do let's get started

Sockets

When a client sends a request to a server via a TCP connection, what happens is that the server has a predefined endpoint with an IP address and port combination (for example 127.0.0.1:3000. The client has it's own IP address and port combination but we never actually see the port. Why? because we don't really need to; every time the client requests something from the server with a new TCP connection it gets assigned a port from a range of ports called ephermal ports; 49,000-65,535 range. These ports are used for temporary purposes and so it gets assigned a random port every time a new TCP connection occurs.So the unique combination of the clients IP + port along with the server's IP + port is what identifies a TCP connection and each side of the connection exists sockets, the client's socket is the initiator and the server's socket is the listener socket.

To define a socket in Ruby all we need to do is:

require 'socket'socket = Socket.new(Socket::AF_INET, Socket::SOCK_STREAM)

what this does is just instantiate a new instance from the Socket class giving it two arguments; the first is AF_INET which basically defines the socket with an IPV4 of protocols and the second is SOCK_STREAM which identifies the protocol of the connection which is TCP in this case, if we wanted to use UDP we would define it with DGRAM instead.

Defining a server

After we instantiated a socket, we actually need to bind this socket to a port of our own so that the client has the combination of the server's IP address + port in order to be able to connect to us. There's also a range that we need to chose from and we can't go lower or above this range; 1025-48,999 because lower than that are well known ports that are used by the system and above that are the ephermal ports as we discussed above.Before building our first server we need to define the ip that we'll bind our server to, basically we can have multiple network interfaces on our system; one of them is the loopback interface which is represented by localhost or 127.0.0.1; this special interface routes all the outgoing requests back to itself hence the name loopback. Also you can have other interfaces with different IP addresses.All in all if you want to bind your server to 127.0.0.1 you'll only be able to listen from the loopback interface But if you bind the server to 0.0.0.0 you'll be able to listen to all the interfaces. This is very useful in containerization for example docker where you'd want your server inside the container to listen to all interfaces so we can connect to them externally.Last thing is to make the server actually listen for connections. Listening usually takes a queue of requests that once it exceeds it will begin to drop the requests and you'll get a connection refused error. This is called the listen queue of a socket and you can give it a value of the maximum number of connections in the queue by identifying a number when calling listen in ruby.Usually to get the maximum number your device can handle you can print Socket::SOMAXCONN to see the output.

Once the server starts listening for connections, we can start accepting connections using the accept call which blocks until a connection Is there to accept.We can create a socket with a IP address and port as follows:

require 'socket'local_socket = Socket.new(:INET, :STREAM)local_addr = Socket.pack_sockaddr_in(3000, '127.0.0.1')local_socket.bind(local_addr)local_socket.listen(Socket::SOMAXCONN)connection, _ = local_socket.accept

Accepting a connection returns the connection itself which is the IP address + port combination of the client that instantiated the request with other info aswell. connection is a socket instance returned. what internally happens is that on accepting a connection the socket is attached to the processes' file descriptors so basically the process knows about the socket since it's in its file descriptors. You can learn more about file descriptors from a previous article I wrote hereThis socket listens on localhost:3000, That means to connect to our server we can just use the command netcat to check if it's running or not.

nc localhost 3000

If it succeeded you'll realize that the server exists since accept was blocking and the request succeeded. This was just an introduction on how servers are built, let's get into the patterns!

Different network architecture patterns

Before diving in I just wanted to quickly talk about different network architecture patterns that exist in our world. We use these patterns every day wether we spin up a web server, visit a website , .. etc and we take them for granted. I'll briefly explain each one and we'll get started right away with the server implementation.

Serial Pattern
Process Per Communication
Thread Per Communication
Preforking
Thread Pool
Evented (Reactor)
Hybrid

Serial PatternWith this pattern all connections are handled serially; no concurrency. This means every client must wait in a line until the client that came before him finishes.The pros of this is obviously it's very simple since there's no concurrency you don't deal with lots of headaches that concurrency comes withThe cons of this is how slow it would perform, it would be slow in it's best possible performance so imagine what would happen if a client had a slow request.

Process Per CommunicationThis architecture relies on creating an ENTIRE new process (via forking) just to handle a clients request. The process will die after the clients request finishes.So the server can handle incoming connections along with users requests but the overload of the processes per request is a bit too muchThis still has the advantage of simplicity and achieves parellelism and/or concurrency depending on the machine of course.The main disadvantage is the number of processes that have a linear relationship with the number of requests. This can overload the machine and make it unusable

Thread Per CommunicationSimilar to the approach above but lighter since we deal with creating threads not processes; Threads are more lightweight than processes. But since all threads share the same memory here we might need synchronization and locking between them to prevent unwanted race conditions.One other disadvantage is as the number of threads increase, the overhead of the context switch happening between them increases via kernel which isn't optimal.This has the same disadvantage as the approach above as-well as the number of requests grow the threads will do as-well which can overwhelm the system and make it unusable.

PreforkingThis approach is a better way of the process per communication approach. What happens is we have a main server which forks a predefined number of child processes; for example 10. On doing so the children all inherit the file descriptors of the parent, hence inherit the server socket. The kernel automatically load balances connections across all the processes with the socket.The main server has to keep an eye out for the child processes and respawn one if it died unexpectedly.This pattern has the advantage of keeping everything separated because each process has its own memory.However it can be very expensive to fork even 10 processes because as we know each process gets it's own memory so if a process has the size of 100 MB then 1GB of our memory will be dedicated to only spawning the processes. This is without the consideration of wether it has Copy on write semantics which saves more memory.

Thread PoolingSimilar to preforking, this pattern spawns a predefined number of threads and dedicates each connection to any available thread. The kernel makes sure each thread gets a single connection aswell. The advantage of this is that we can spawn more threads because they're lightweight than processes in the preforking pattern above. The main thread keeps monitoring it's children while each get connections and handle them accordingly. This approach is very good for concurrent processing and not a burden on the system.

Evented (Reactor)This pattern has gained a lot of popularity in the past few years, This pattern is single threaded and single process. But achieves a really high level of concurrency on par with the others.

How it works

The server monitors the listening socket for incoming connections.
Upon receiving a new connection it adds it to the list of sockets to monitor.
The server now monitors the active connection as well as the listening socket.
Upon being notified that the active connection is readable the server reads a chunk of data from that connection and dispatches the relevant callback.
Upon being notified that the active connection is still readable the server reads another chunk and dispatches the callback again.
The server receives another new connection; it adds that to the list of sockets to monitor.
The server is notified that the first connection is ready for writing, so the response is written out on that connection.

This is done basically by a unix syscall such as select(2) syscall which is rarely used now and there are better options such as epoll(7). I encourage you to check these out, they basically can have a bunch of sockets and watch them for reading, writing and whenever a socket is ready it would return it to be processed.

HybridThis is not a specific pattern, it's a combination of one or more of the patterns discussed above; for example nginx which is a popular web server uses a combination of the preforking pattern along with the reactor pattern to serve millions of concurrent requests. This takes maximum advantages of server resources.

Summary

It's important to understand how things work bottom-up because in my opinion it can boost your creativity in creating new, even undiscovered patterns! This concludes part 1 of this 2 part article, in the next one we'll actually get to building a server from scratch step by step. Till we meet again!

References

Working with Ruby by Jessie StorimerThis book is amazing I encourage everyone to read it at least once, it would really change the way you think and help boost your creativity.

Distributed Systems Hands On - Deploying a sharded memcache using Kubernetes

Amr Elhewy — Tue, 28 Jun 2022 14:04:30 GMT

Hello everyone in this article I'm going to be talking about caches in distributed systems and doing a hands on deploying a sharded memcache across multiple nodes in a Kubernetes cluster. Before we get started if you wish to code along you'll need to have Kubernetes installed on your device and you'll need a local cluster running and to be able to do that check out minikube and follow the steps to install it and you'll have a local cluster running in no time.

Main goal of the article

The main goal in this article is to add some sort of availability to our cluster which will consist simply of a go app that has an endpoint which when we call with a query param it will cache that param as a key with a random value in one of the shards and always hit the same shard retrieving the value if it exists.

Why sharded caches

Sharding in general is most probably a solution when you have data that a single machine can't hold alone so you begin to split up the data amongst several machines. In our example we'll assume that there are many users hitting the endpoint to a point where the cache is too much to handle for a single node memory. So we thought of a solution and it was to shard the cache among several nodes and have a sharding key which routes each request to its shard respectively.

Wait what even is a sharding key?

A sharding key is an identifier that routes certain requests to certain shards, for example let's say that we have 2 shards one in Europe and the other in NA and based on the geographical location of the user requesting we route them to the nearest shard. This is the responsibility of the sharding function which takes in for example the country the request originated from and routes that address to its shard. The sharding function is similar to that in the hash table data structure which takes a key and maps it to the value of that key. Usually you get a value from the sharding function and use the modulu % operator over the number of shards you have. That guarantees that a certain request will always get routed to the same shard. The only thing that might change this is if you scale up or down your shards for example add extra nodes.

Scaling up or down the shards would re route to different nodes

As the value of the modulu would change it might re route to different shards effectively reducing the hit rate of the cache. But Consistent Hashing exists for this reason which aims to minimize the number of rerouted requests, it is very well explained in this article here

Now that that's out of the way let's look at a simple Kubernetes cluster that consists of the following:

A Golang web server with an endpoint that simply takes a query param and adds it as a key in the cache
Twemproxy which is an open source tool developed by Twitter and automatically shards data between multiple shard servers and uses consistent hashing under the hood.
Memcached which is an in memory caching system

First of all we create a Deployment file in Kubernetes name it deployment.yml for our Go app as follows:

apiVersion: apps/v1kind: Deploymentmetadata:  name: goappspec:  replicas: 3  selector:    matchLabels:      app: goapp  template:    metadata:      labels:        app: goapp    spec:      containers:      - name: goapp        image: amrelhewy/kubeapp:4.151        resources:          limits:            memory: "400Mi"            cpu: "200m"        ports:        - containerPort: 3000

To apply the above configuration just type

kubectl create -f deployment.yml

The snippet above will create 3 replicas of our Go app distributed across our Kubernetes cluster nodes.Also the reason I have the cpu and memory this low is because I deployed an actual cluster on digital ocean and was struggling with the poor nodes I had :D

Now creating a service go_service.yml for the deployment above which also can act as a LoadBalancer

apiVersion: v1kind: Servicemetadata:  name: k8s-go-svcspec:  type: LoadBalancer  selector:    app: goapp  ports:  - name: http    port: 80    targetPort: 3000

Apply with

kubectl create -f go_service.yml

This creates a service for our deployment of type LoadBalancer, services allow us to get DNS names for each pod along with a DNS name that resolves to one of the pods via load balancing. But to access it from outside the cluster either connect to the external ip that the service will have and to get that just do

kubectl get svc -o wide

-o stands for output.And take the external ip along with the port(we'll do that later on)

Or if you're local using Minikube you can use port forwarding to connect. All you have to do in our example is

kubectl port-forward svc/k8s-go-svc 3000:80

And access it using localhost:3000

However we are not done yet! we let's move on to deploying memcached, to do so we use a StatefulSet alongside with a Kubernetes Headless Service

memcached.yml

apiVersion: apps/v1kind: StatefulSetmetadata:  name: sharded-memcachespec:  selector:    matchLabels:      app: sharded-memcache  serviceName: "memcache"   replicas: 3  template:    metadata:      labels:        app: sharded-memcache    spec:      containers:      - name: memcache        image: memcached        ports:        - containerPort: 11211

Apply using:

kubectl create -f memcached.yml

This would create 3 replicas for our sharded memcache. The serviceName added is important as when we use the headless service it has to be the same name for it to work. Because each pod in the memcache stateful set gets a certain DNS name that has the pattern of .:port

The service is as follows:

memcache-svc.yml

apiVersion: v1kind: Servicemetadata:  name: memcachespec:  clusterIP: None  selector:    app: sharded-memcache

Apply using:

kubectl create -f memcache-svc.yml

The clusterIP:None ensures that the service will be headless and won't have any internal/external cluster IPsAfter applying the service we should have 3 DNS names for each of our memcache replicas as followssharded-memcache-0.memcache:11211sharded-memcache-1.memcache:11211sharded-memcache-2.memcache:11211

These 3 are accessibly from inside the cluster and all that's left is installing twemproxy and connecting everything together.

*I'm aware I didn't map the volumes for any snapshotting and if the nodes restart or fail the data would be gone, it's just for demo purposes

Before we go further we need to have some configurations for twemproxy that are ready when it loads - we need to provide it with the 3 DNS's above and let it know we are sharding memcache not Redis for example. To do so we create a Kubernetes ConfigMap which allow us to store config files and such.The configuration is as follows:

twem-config.yml

memcached:  listen: 0.0.0.0:11211  hash: fnv1a_64  distribution: ketama  auto_eject_hosts: true  timeout: 400  server_retry_timeout: 2000  server_failure_limit: 1  servers:   - sharded-memcache-0.memcache:11211:1   - sharded-memcache-1.memcache:11211:1   - sharded-memcache-2.memcache:11211:1

The twemproxy server will listen on port 11211, the hash and distribution are recommended by twemproxy as they are the hashing algorithm and the sharding functions along with some timeout configurations and the 3 replicas we have.Apply the configmap as follows:

kubectl create configmap --from-file=twem-config.yml

The twemproxy deployment is as followstwemproxy.yml

apiVersion: apps/v1kind: Deploymentmetadata:  name: shared-twemproxyspec:  replicas: 3  selector:    matchLabels:      app: shared-twemproxy  template:    metadata:      labels:        app: shared-twemproxy    spec:      containers:      - name: shared-twemproxy        image: tarantool/twemproxy        command:          - nutcracker          - -c          - /etc/config/twem-config.yml          - -v          - "7"        volumeMounts:         - name: config-volume           mountPath: /etc/config        resources:          limits:            memory: "250Mi"            cpu: "500m"      volumes:        - name: config-volume          configMap:            name: twem-config---apiVersion: v1kind: Servicemetadata:  name: memcache-proxyspec:  selector:    app: shared-twemproxy  ports:  - port: 11211    targetPort: 11211

And Apply using

kubectl create -f twemproxy.yml

Let's break everything up. This file is a deployment and a service bundled up in one file together, The deployment has 3 replicas of twemproxy they take the configmap we created above and run the command nutcracker providing it with the config file & the logger level (set it to 7 just for experimenting)The service then is called memcache-proxy and runs on port 11211.

Summary The go app when it wants to connect to a cache sends a request to the twemproxy service which then sends the request to one of the replicas we made. After that the twemproxy server does the hashing on the request and routes it to it's supposed memcache shard. It can be seen as the image below:

All you have to do to use it if you followed everything correctly.

curl -X GET "http://localhost:3000/memcache?lookup=cat"

The first time it will be cache missed but added to cache and after that it will always return the value from the shard in the cache.first response example

GET cacherr errmemcache: cache missadded to cache

After that

VALUEvalue

I saved the value of the lookup key to be the word 'value'

Summary

Distributed systems is one of the most versatile topics out there, you can shape up an app in lots of different ways and it all depends on your needs, for example adding availability to this cache can be done by replicating each shard alone so instead of connecting to a Kubernetes pod we connect to a service that serves pod replicas and so on. Unlimited ideas can come to mind sometimes but as I said everything depends on your requirements and if even distributing your system is worth the hassle.I hope this article gave you an insight of how everything works together and till the next one!

Designing Data Intensive Applications CH 7 - Part II 'Weak Isolation Levels'

Amr Elhewy — Sat, 04 Jun 2022 23:55:11 GMT

Welcome again to the second part of chapter 7 in Designing Data-Intensive Applications where we discuss isolation levels and different problems that can occur and their possible solutions.

Transactions can safely run in parallel if they don't touch the same data because neither depends on the other, concurrency issues only occur when one transaction reads data concurrently being modified by another transaction, or when two transactions try to modify the same data.

Concurrency bugs are hard to test, because usually they occur when you become unlucky with timing which may occur very rarely and which also is very difficult to reproduce.

Databases tried to hide concurrency issues from application developers by adding transaction isolation.

There are weak isolation levels which have a more optimistic approach and serializable levels which are pessimistic and come at a huge performance cost (we'll get into that later)

Having a very good understanding of concurrency and the underlying problems it causes can help you a lot in building reliable and correct applications.

To start things off we'll look into several weak (non serializable) isolation levels that are used in practice and discuss in detail what kinds of race conditions can and cannot occur, so let's start!

Read Committed

This is the most basic level of isolation and basically makes 2 guarantees

When reading from the database you will only see data that has been committed which basically guarantees prevention of dirty reads which is when a transaction updates a certain value without committing and another transaction sees that update
When writing data you will only override committed data which means no dirty writes "dirty writes basically changing a value that was not committed yet so overriding an already changed value basically"

Dirty writes are a big problem and preventing them saves us from a lot of problems that can occur.

How is read committed really implemented? Preventing dirty writes are done by acquiring row level locks, when a transaction wants to change a certain row, we acquire a lock on it, and then releases the lock when the transaction is either committed or aborted.

Preventing dirty reads firstly was done by the same method, acquiring a lock, when a transaction wants to read a row it acquires a lock and releases it immediately after reading but acquiring read locks doesn't work well in practice because long running write transactions can make a read operation wait for a long time until it has access to the lock. This slows down the read operation and would slow the whole response all together.For this reason most databases now prevent dirty reads by remembering the old committed value and the new value in the current transaction that currently holds the write lock, while the transaction is ongoing any other read transactions will still get the old value. Only when the write transaction commits any read operations get the new value. This allows write operations not to block read operations and vice versa.

Snapshot Isolation and Repeatable Read

As good as read committed appears to be, some problems still might occur which might really mess up your data.

Check out the image above, Alice has 1000$ savings at a bank split across 2 accounts with 500$ each, now a transaction transfers 100$ from one account to another, if she's unlucky enough to look at her account at the time of the transfer happening she may see one account balance at a time before the transfer started, and the other after the transfer has committed. This seeing a total of 900$ instead of 1000$ so now 100$ vanished into the air.

This is usually called non repeatable read or read skew In Alice's case its not really a lasting problem, she can just refresh and she'll get the updated values. But the problem arises in other operations such as:

Backups; when taking backups they usually take hours, and the database keeps accepting read and write requests. Some parts of the backup might have old data which causes inconsistencies across the database
Analytic queries and integrity checks; These queries scan a large part of the database and they could return non sensical results of they scan the database at different points of time.

These problems can be solved using snapshot isolation, the idea is each transaction reads a consistent snapshot of the database that is the transaction sees all committed data to the start of the transaction even if it got changed during the transaction it will still see the old values. This is an optimistic isolation level approach.

How is snapshot isolation implemented? Like read committed, snapshot isolation uses write locks to prevent dirty writes, however reads don't require any locks. The key principle of snapshot isolation is readers never block writes and vice versa. The database must keep several committed versions of an object because various in progress transactions may need to see the state of the database at different points in time, because it maintains several versions of an object side by side. This technique is known as multi version concurrency control (MVCC)

Databases that implement MVCC use it for read committed level as well, to maintain different versions of an object (as we mentioned above), a typical approach is that read committed uses a separate snapshot for each query, while repeatable read uses a separate snapshot for the whole transaction.

Visibility rules for observing a consistent snapshot; when a transaction reads from a database, transaction IDs are used to detect which objects it can see and which are invisible.Visibility rules are:

At the start of the transaction the database make a list of all the other transactions that are in progress(not yet committed or aborted) at that time and any writes that have been made by them are ignored even if it subsequently commits.
Any writes made by aborted transactions are ignored
Any writes made by a transaction with a later transaction ID( which started after the current transaction started) are ignored
All other writes are visible to the applications queries.

How do Indexes in snapshot isolation? One option is to have the index simply point to all the versions of an object and require an index query to filter out any object versions that are not visible to the current transaction and when garbage collector removes the outdated versions their indexes can be removed as well.In practice, the implementation details really determine the performance of multi version concurrency control. For example; PostgreSQL has optimizations that avoid index updates if different versions of the same object can fit on the same page. Another approach used by CouchDB for example, is a append-only/copy on write approach, which doesn't overwrite pages of a tree when updated instead creates a new copy of each modified page, parent pages up to the root are copied and updated to point to their new children pages and any other page can remain immutable

So till now snapshot isolation seems great, but there still are some cases which won't really go as planned when it comes to snapshot isolation, one of them is called lost updates

In the image above, two concurrent transactions increment a certain counter, but unluckily enough they both get the same value and increment based on that so they overwrite each other and a lost update occurs; unfortunately snapshot isolation doesn't really save you from these kind of problems, but luckily there are some things to keep in mind that will help prevent this problem from happening;The first solution is to use atomic write operations; this prevents the implementing of a whole read-modify-write cycle as a whole and does what is required basically in a single operation. For example the following is concurrency safe for most databases;

UPDATE counters SET value = value + 1 WHERE key = 'foo'

Atomic operations are usually implemented by taking an exclusive lock on the object and no other transaction can modify it until the update has finished. This technique is known as cursor stabillity, Another approach is to simply force all atomic operations to be executed in a single thread (serializable approach) e.g Redis

Explicit Locking If the applications database for some reason doesn't have atomic operations, another way is to explicitly lock the objects that are going to be updated, then the application can perform a read-modify-write cycle and if any other transaction tries to concurrently read the same object it is forced to wait until the first read-modify-write cycle to complete.

For example, consider a multiplayer game where several players move a figure concurrently, atomic operations won't be sufficient because you'll need to check and perform extra application logic to see if the move is correct according to the game's rules, this is a perfect use for explicit locking

BEGIN TRANSACTION;SELECT * FROM figuresWHERE name = 'robot' AND game_id = 222FOR UPDATE; // for update locks all queried objects and releases the lock after transaction commits or aborts// check if the move is valid then update the position of the piece that was returned by the select aboveUPDATE figures SET position = 'c4' WHERE id = 1234;COMMIT;

This will work but you need to think carefully about the application logic because it's easy to forget to add a lock somewhere and introduce a race condition.

Some databases automatically detect lost updates and force one transaction to retry when a lost update condition happens, e.g PostgreSQL

Compare-and-SetIf databases don't have transactions, you'll sometimes find an atomic compare and set operation, it basically allows updates to occur if and only if the values of the object were not changed since you last read it. Otherwise it will be retried

Write Skew and PhantomsImagine this example; you are writing an application for doctors to manage their on call shifts at a hospital, the hospital usually tries to have several doctors on call at any one time but it absolutely must have at least one doctor on call, Doctors can give up their shifts( if they are sick ) provided that at least one colleague remains on call in that shift. Now imagine Alice and Bob are two on call doctors, both are feeling unwell and both decide to request a leave, unfortunately they happen to click the button to go off call exactly at the same time, what happens next is illustrated in the figure below;

In each transaction the application first checks that two or more doctors are currently on call; if yes its safe for one doctor to go off call and since the database is using snapshot isolation both checks return 2 so both transactions proceed to the next stage, Alice updates her record to go off call as well as Bob and the transactions commit. And there is now no doctor on call so a main requirement of the application has been violated.This is usually called write skew which is really a generalization of the lost update problem, they usually occur when two transactions read the same objects and then update some of those objects, when you update the same object you got either a dirty write or a lost update.

There are less ways to prevent write skew than lost updates, from which are;

Configuring constraints might be a good shout, which are enforced by the database (e.g uniqueness)however in order to satisfy that one doctor needs to be on call you'd need a constraint that involves multiple objects which is far complicated to implement
Using serializable isolation which would force the two transactions to execute serially (one after the other) but if you can't do that then locking is your best bet.

Explicitly locking the returned rows from the first read operation would be as follows;

BEGIN TRANSACTION;SELECT * FROM doctors WHERE on_call = true AND shift_id = 1234 FOR UPDATE;UPDATE doctors SET on_call = false WHERE name = 'Alice' AND shift_id = 1234;COMMIT;

More examples of write skew include a scenario where we have a meeting room and there is a booking system to it, there cannot be two bookings at the same time for the same room so when someone wants to book the room you would check for conflicting bookings first if none are found then proceed to create the meeting.

BEGIN TRANSACTION;SELECT COUNT(*) FROM bookings WHERE room_id = 1234 AND end_time > time AND start_time < time;// if the previous query returned zeroINSERT INTO bookings(room_id, start_time, end_time, user_id)VALUES etc..COMMIT;

In this example, two people might check at the same time causing two bookings to be created at the same exact time. You'd need serializable isolation or materializing the conflicts as we will see later.

Claiming a username where for example a website has a restriction that usernames must be unique, if two users entered the same username at the same time we might end with two users having the same username, luckily enough a unique constraint on the database would solve this and some application code to retry the unlucky user would work perfect.

Generalizing write skew conditions1 - A SELECT query checks whether some requirement is satisfied by searching for rows that match some search condition (at least two doctors on call, no existing bookings, etc.) 2 - Depending on the result of the first query, the application code decides how to continue3 - If the application decides to go ahead, it makes a write (INSERT, UPDATE or DELETE) to the database and commits the transaction

The effect of this write changes the precondition of the decision in step 2, If you were to repeat the SELECT of step 1 after committing the write you'd end up with different results because the write changed the set of rows matching the search conditions

The effect where a write in one transaction changes a result set of search query in another transaction is called a phantom

Now back to the bookings problem, previously in the doctors problem we were able to attach locks on the number of on_call doctors at a given time which would technically never return zero it would return at least 1 doctor. But in the bookings problem what if there are no bookings? what will we attach the lock at if there are no rows available from the search query? Well this is were materializing conflicts comes in place.

Materializing Conflicts artificially introduces a lock to the database, We could create a table of time slots and rooms and each row in this table corresponds to a particular room for a particular time period (say 15 minutes). You create rows for all possible combinations of rooms and time periods ahead of time, e.g for the next 6 months.

Now a transaction that wants to create a booking can SELECT FOR UPDATE (lock) the rows in the table that correspond to the desired room and time period, After it has acquired the locks it can check for overlapping bookings (our normal procedure) and insert a new booking as before. This additional table can be percieved as a collection of locks which is used to prevent bookings on the same room and time range from being modified concurrently.

Although this solves the problem, but its considered kind of ugly to leak a concurrency control mechanism into the data model, not to mention that they're quite hard and error prone so using materialized conflicts should be your last resort if no alternative is possible. A serializable approach would be better in most cases.

This concludes part II 'Weak isolation levels' and all that remains now is the final part talking specifically about serializable isolation, a more pessimistic approach I would say.

Hope you learned something today from this article and if anyone has any article ideas or some information they would like to share let me know in the comments section below! See you in the next one.

Designing Data-Intensive Applications CH7 - Transactions Part I

Amr Elhewy — Mon, 16 May 2022 21:13:58 GMT

Hello guys, today's article is going to be about chapter 7 from the book 'Designing Data-Intensive Applications' by the brilliant Martin Kleppmann. This chapter talks about transactions in a database, without further or do lets dive right in.The chapter starts off by listing a couple of things that can go wrong in a data system, such as

The database software may fail at any given time
The application itself may fail or crash at any given time
Interruptions in the network may cut off the application from the database
Several clients may right to the database at the same time, overwriting each other's changes
Race conditions between clients may cause unexpected bugs

And to achieve reliability, a system must deal with these kind of problems and make sure they don't cause failures to the whole system. Transactions are a way to minimize or combat these issues

What is a transaction?

It is a way for an application to group several reads and/or writes together in a single logical unit, and they are all executed as it's one operation. Either the transaction succeeds or it gets rolled back. And the application can safely retry the transaction without worrying about any partial failures.By using transactions, the application is free to ignore some potential error scenarios and concurrency issues, because the database takes care of them instead.

But transactions have their advantages as well as their limitations. In order to understand them clearly we'll go into them deeply.

The safety guarantees of transactions

They are often described by the well known acronym ACID (atomicity, consistency, isolation, durability),any system that doesn't implement ACID is known as BASE (basically available, soft state and eventual consistency) which means anything but ACID basically.

Atomicity refers to something that cannot be broken into several parts. It describes what happens if a client wants to make several writes but a fault occurs (e.g a crash, network interruption, full disk), if the writes are grouped together in an atomic transaction and the transaction cannot be committed due to a fault, then the transaction is aborted and the database must discard or undo any writes that were done so far in that transaction.

Consistency basically means that the application notion of the database being 'in a good state' , the idea behind consistency is that you have some specific statements about your data that must be true.(e.g credit and debit in an accounting system must always be balanced) so if a transaction is started with these statements, then they must be preserved once the transaction completes. But it's really the applications responsibility to define its transactions correctly so that they preserve consistency. That isn't something that a database can guarantee, if you write data that invalidates the statements the database can't stop you. In general the application defines what data is valid or invalid and the database only stores it.Atomicity, Isolation and durability are properties of the database, whereas consistency is a property of the application.

Isolation basically talks about the fact that most databases are accessed by lots of clients at the same time. If they are writing/reading several parts of the database that's no problem. But once they access the same parts you can run into concurrency issues (race conditions)Isolation means that concurrently executed transactions are isolated from each other, they cannot step on each others toes. In the classic database books they formalize isolation as serializability which basically means that each transaction thinks it's the only one running in the database at that specific time. This is known as a pessimistic approach which we'll get into later on.However the pessimistic approach is rarely used because it comes at a huge performance cost, most databases use a more optimistic approach and we'll get into that later on.

Durability is the promise that once a transaction has committed successfully, any data it has written will not be forgotten, even if there is a hardware fault or the database crashes.The data has been written to a storage such as a hard drive or a SSD. Also usually involves a write-ahead log that makes B-trees reliable by basically writing the writes that are sent to the database before actually saving them just in case a database crashes or any fault occurs. It allows recovery in the event that anything went corrupt.

To wrap up this part, in ACID, atomicity and isolation describe what a database should do if a client makes several writes within the same transaction. In atomicity you don't have to worry about partial failures as if the transaction fails half way, a whole rollback to the transaction will occur as if it never happened. And isolation guarantees that concurrently running transactions shouldn't interfere with each other. If one transaction makes several writes then any other concurrent transaction should see either all or none of those writes, not some subset.

Then we proceed to talk about 2 types of operations; Single-Object and Multi-Object operations.Multi-Object Operations concern modifying multiple rows or objects from a single table or multiple, these type of operations need a transaction so that everything is kept in sync and data integrity is ensured. Anything between BEGIN and COMMIT in a transaction is considered to be in the multi object operation.

Single-Object Operations need transactions as much as multi objects do, imagine writing a 20KB JSON document to a database

If a network interruption occurred during the first 10 KB, does the database store that 10KB or rollback?
If power fails while overriding the previous value, which value gets stored?
If another client reads the document whilst updating it, does he see a partial value?

Since these questions are confusing databases guarantee Isolation and Atomicity for a single object.Single Object operations are useful because they prevent lost updates (don't worry if you don't know what that means we'll get into it in Part II) when several clients write to the same object concurrently, however they are not transactions in the usual sense.

Error handling and aborting

One of the key features in transactions is that it can be aborted and safely retried if an error occurred, ACID databases are based on this philosophy.But errors do happen, and popular ORMs (Object Relational Mappping) like Rails' ActiveRecord doesn't retry aborted transactions.It usually bubbles up a raised execution through the stack.Although retrying the aborted transaction Is a simple error handling mechanism, it's not always efficient

If the transaction succeeded but the network failed when notifying the client, deduplication may occur on retry as the transaction will be made twice (unless you have a deduplication mechanism)
If the failure is due to overload then retrying the transaction will make things worse, so limiting the number of retries is the way to go
It's only worth retrying on a transient error ( deadlock, isolation violation or network error) but not permanent errors (constraint violation) as a retry would be pointless.
If the transaction has side effects outside of the database (e.g sending an email when the transaction commits) you wouldn't want to send an email every time we retry a transaction as these side effects may happen even on transaction abortion. Two-Phase Commit can help in such cases (I recommend watching Martin Kleppmann's video explaining Two-Phase Commit and how it works in distributed systems if you're interested)
If the client fails while retrying any data it's trying to write is lost.

This is an introduction on transactions and in the next chapter we'll dive deeper into Isolation and different isolation levels and various problems that can occur. See you in the next one!

What are file descriptors in Unix? A glimpse of how Unix works.

Amr Elhewy — Mon, 25 Apr 2022 15:35:29 GMT

Hello guys today we're going to be going a bit deep into what are file descriptors in the operating system and by understanding them we're going to talk a bit about how the terminal itself works. Let's work!

What are file descriptors?

File descriptors are nothing but an index i.e 0, 1, 2. When initializing a process it is created for you. Every process has its own file descriptor, now that we got that out of the way what do these indexes point to?These indexes point to an entry in a table called System Open File Table, which is global for all processes and not specific to a single one. What this table does is it tracks every open file by each process and it has information such as the current offset of the file, the open mode of the file (Read only, Read Write, Write only).If the same process opens the same file twice it will have 2 entries in the open file table and so on.Now what does the entries in the open file table point to afterwards? They point to another data structure called the V Node Table. This table has an entry for each file in the file system and contains information related to each file such as the UID of the owner, GroupId of its belonging group, the file size and the actual number of blocks the file uses, .. etc.

Connecting the dots

A terminal itself is a process with a PID, etc. When you run any program in the terminal what happens under the hood is that the terminal calls the fork() system call followed by an exec() call which then makes the program executing a child of the terminal process.

Before continuing, What does calling fork followed by exec even do?When you fork a process you literally duplicate it into a child process. The entire process gets duplicated and gets issued another PID. Then when calling exec after that what we do is replace the existing process with a new one, keeping the same PID. That way gives us more flexibility in configuring what we want to keep/change from the things that exec inherits. An example of things exec inherits are file descriptors and Process group and session IDs.

Also when we close a terminal running a child process. For example a web server. A signal is passed to the whole process group which has all the child processes too. That's the only reason the child gets killed on killing the parent. I just put this here so I don't forget it.

Default descriptors

Every process created gets 3 default file descriptors on creation. stdin, stdout and sterr which are 0, 1, 2 respectively. These are input/output steams and the file descriptor always maps to files in the end, these input streams map to different files depending on what they are reading/writing from. There are device files for example, What even are device files? Since in Unix everything is considered a file, device files are any files associated with device drivers. What does this mean? it means that any device driver for hardware such as the keyboard, network driver, .. etc they all have device files associated to them which we can read() and write() to. The OS detects when we're reading and writing to a device file and proceeds in handling this special case. There exists other files that aren't device files such as /dev/null which is basically a black hole.all reads return end of file and all writes are discarded. These files all have special cases which make them act as files for abstraction.

There are 2 main types of device files. Character device files and Block device files, character device files are concerned with IO that return separate bytes as a stream of data e.g Keyboard. Whereas block files handle IO that reads from blocks of data, e.g hard drive.In the terminal the stdin is associated with our keyboard by default, hence it's a stream to a character device file.

So as long as you have a file descriptor, it probably supports reading, writing, etc. Other than stdin, stdout and stderr you have also socket descriptors and a lot more.

Summary

When we open a terminal it's initially waiting on the read() command ( waiting for us to type anything )That's where stdin which is mapped to our keyboard as explained above comes in. When we type in the terminal for example lets say

ls -l This list command will fork and exec the ls program, now this new ls process inherits the file descriptors and hence has its stdout to the terminal window already, it executes the command and finishes then the parent process gets back it's control allowing for more commands to be typed in.

To be able to understand this previous sentence all the above was necessary just to get a simple understanding of how it all works under the hood.

This article took me some time to understand and hopefully I gave you some insight on how it all works under the hood! Hope you enjoyed it.

Building an API Rate limiter using Go

Amr Elhewy — Fri, 01 Apr 2022 21:53:14 GMT

In today's article we're going to be taking a quick look at how to implement rate limiters for APIs.

What are rate limiters?

They're basically a way to stop spamming requests to an API, where there is a cool down window for that specific user and he's going to have to wait until being able to do requests again.

This tutorial will be a simplistic approach into how to rate limit APIs

Getting Started

To get started simply initialize a new directory called rate_limiter and run

go mod init rate_limiter

Then inside the directory create 2 files main.go and limiter/limiter.go, these will contain our code that we'll dive into in the next step.

We'll start off by creating a global rate limiter (not user specific) and then we'll modify our code accordingly.We'll use an algorithm called Token Bucket Algorithm where basically you have a bucket of tries (e.g 3 tries) and the bucket gets filled by 1 each n seconds (we'll get to adjust this in the implementation) so if we for example refill a bucket by 1 try every 1 secondthen if I used all 3 tries in the first second I will have to wait for a bucket refill to be able to access the API again.

Firstly implementing the limiter function which will act as a middleware in our server.We'll use the time/rate package provided by go, this package implements the token bucket algorithm and provides us with the needed functions to start the implementation!

Inside limiter.go

package limiterimport (    "net/http"    "golang.org/x/time/rate")var rate_limiter = rate.NewLimiter(1, 2)func limit(next http.Handler) http.Handler {    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {        if rate_limiter.Allow() == false {            http.Error(w, http.StatusText(429), http.StatusTooManyRequests)            return        }        next.ServeHTTP(w, r)    })}

What we did here was simply initialize a http handler func which we'll use as our middleware. But firstly define a rate_limiter which is from the package time/rate, the first value is the time required for filling 1 try into the bucket, so this limiter fills in a try every 1 second, the second value is the total number of tries in a bucket which is in our case 2 We simply check if he has any tries in the bucket, that's what Allow returns, it returns true if he has any tries and false if he doesn't.If he doesn't we return a 429 status error which is too many requests.

In main.go we will have the following code

package mainimport (    "log"    "net/http"    "rate_limiter/limiter")func main() {    mux := http.NewServeMux()    mux.HandleFunc("/", helloWorldHandler)    log.Println("listening on port 3000")    http.ListenAndServe(":3000", limiter.Limit(mux))}func helloWorldHandler(w http.ResponseWriter, r *http.Request) {    w.Write([]byte("Hello World!"))}

This code is a simple server running on port 3000, and we wrap the mux which is the request handler with the middleware limiter.

Running a curl command once will yield us this response Hello World, but successively running it will print 2 hello worlds and the 3rd response will be a too many requests response, as follows

~| curl -i localhost:3000HTTP/1.1 200 OKDate: Fri, 01 Apr 2022 20:11:39 GMTContent-Length: 12Content-Type: text/plain; charset=utf-8Hello World!%~| curl -i localhost:3000HTTP/1.1 200 OKDate: Fri, 01 Apr 2022 20:11:39 GMTContent-Length: 12Content-Type: text/plain; charset=utf-8Hello World!%~| curl -i localhost:3000HTTP/1.1 429 Too Many RequestsContent-Type: text/plain; charset=utf-8X-Content-Type-Options: nosniffDate: Fri, 01 Apr 2022 20:11:40 GMTContent-Length: 18Too Many Requests

Rate limiting by user

Till now we rate limited the API from everyone basically trying to access it. What if we can do that per user?We need a way to track maybe the IPs of the users and for each user add a rate limiter just for him. It's best to store them in a map where every ip maps to the limiter of the user. Also to avoid this map getting bigger than necessary we can cleanup and remove the users that no longer access the api, let's get to implementing this.

We'll update our limiter.go with the following code

package limiterimport (    "net"    "net/http"    "sync"    "golang.org/x/time/rate")var clients = make(map[string]*rate.Limiter)var mu sync.RWMutexfunc getClient(ip string) *rate.Limiter {    mu.RLock()    limiter, exists := clients[ip]    mu.RUnlock()    if !exists {        mu.Lock()        limiter = rate.NewLimiter(1, 2)        clients[ip] = limiter        mu.Unlock()    }    return limiter}func Limit(next http.Handler) http.Handler {    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {        ip, _, _ := net.SplitHostPort(r.RemoteAddr)        limiter := getClient(ip)        if limiter.Allow() == false {            http.Error(w, http.StatusText(429), http.StatusTooManyRequests)            return        }        next.ServeHTTP(w, r)    })}

What we did here is in our handler obtain the IP address of the user and call getClient which basically checks a built map of IPs to limiters and if the client is a new entry it adds it and returns its limiter, otherwise it returns the limiter of the client that exists.Inside getClient we implement a RWmutex lock to control concurrency issues that might occur accessing the same shared memory space by multiple concurrent users. It basically gives out a shared lock for reads and exclusive write locks. So only 1 can write at a time but several users can read at one time. For more info about concurrency and mutex locking checkout my blog here

This change returns the same response as before but now multiple users each have a limiter of their own when accessing this API. One last thing left is cleaning up users that no longer access the API to free up that memory.A good solution Is implementing a last seen where every client has this last seen on him and if it exceeds for example 2 minutes we clear him from the map!This could be done by a different goroutine that runs in the background and cleans the map so it doesn't keep on growing forever.

Our limiter.go should now look like the following

package limiterimport (    "net"    "net/http"    "sync"    "time"    "golang.org/x/time/rate")type client struct {    limiter   *rate.Limiter    last_seen time.Time}var clients = make(map[string]*client)var mu sync.RWMutexfunc getClient(ip string) *rate.Limiter {    mu.RLock()    user, exists := clients[ip]    mu.RUnlock()    if !exists {        mu.Lock()        limiter := rate.NewLimiter(1, 2)        clients[ip] = &client{limiter, time.Now()}        mu.Unlock()        return limiter    }    user.last_seen = time.Now()    return user.limiter}func CleanupUsers() {    for {        time.Sleep(time.Minute)        mu.RLock()        for ip, v := range clients {            if time.Since(v.last_seen) < 2*time.Minute {                delete(clients, ip)            }        }        mu.RUnLock()    }}func Limit(next http.Handler) http.Handler {    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {        ip, _, _ := net.SplitHostPort(r.RemoteAddr)        limiter := getClient(ip)        if limiter.Allow() == false {            http.Error(w, http.StatusText(429), http.StatusTooManyRequests)            return        }        next.ServeHTTP(w, r)    })}

And simply add this in main.go

go limiter.CleanupUsers()

What we did is simply add a type client with a last seen and his normal limiter, and we update the time according to every visit for that client.We also added CleanupUsers which would run on a separate goroutine and clean up the map by removing users that haven't been using the api for 2 minutes and invoked it upon server start.

By doing the previous implementation we implemented a user specific rate limiter that limiters the user from spamming the api. Before I conclude the article I would love to thank Alex Edwards and his amazing website full of insightful articles that helped me make this article. Here's a reference to his article on the same topic.That was it for this article and see you on the next one!

A dive into GRPC using Go

Amr Elhewy — Wed, 02 Mar 2022 13:18:19 GMT

Hello everyone, I wanted to do a quick article about GRPC and what exactly is GRPC with a quick demo on a couple of implementations using Go. Let's begin

What is GRPC?

Generally RPC stands for remote procedure calls, RPCs are calls that can be made from one server to another to invoke a certain action, it's a request response protocol where you remotely invoke a certain action and get it's response back. but isn't this similar to the REST architecture? Yes it is however there are some key differences which we will discuss next.

What is the difference between GRPC and REST?

By default GRPC is built on top of HTTP/2 which allows multiplexing several streams into one TCP connection while normal rest relies on HTTP1.1 however it also can use HTTP/2
In REST you are obligated to use one of the HTTP Methods (GET, POST, PUT .. etc) but sometimes I want to do something that neither of these methods make sense in the context of it. For example if I want to do a RPC that restarts a certain service or even shuts it down, neither of these methods really make sense do they?
REST doesn't have a formal API contract between both parties, Protocol buffers are pre defined with a structure that you know exactly what goes through the wire.
Bi-directional streaming (when instead of sending just a response they send back a stream of data) is not possible using REST, this is possible using GRPC, we'll take a quick look at it in the code example.
GRPC uses something called Protocol Buffers(binary based protocol) as a way of sending data back and forth I won't be going into detail of what it specifically is, but it's size is so much smaller compared to a normal HTTP text protocol so network wise it transfers data of less size back and forth.
Not only lighter data but Protocol Buffers are also language neutral, you don't have to adjust between languages when using it. All the work goes into writing the .proto file and simply running the Protobuf generator to generate the code necessary for the language chosen. (Hang in there we'll get to this part in the code)

Now that we have a basic understanding of what it is let's take a quick demo to help make things look more clear. In this example I'll be having a Protocol Buffer file to define the machine readable contract that will contain the following

currency.proto

syntax = "proto3";option go_package = "grpc/protos";message RateRequest{    string From =1;    string To = 2;}message RateResponse{    float Rate=1;}service Currency{    rpc GetRate(RateRequest) returns (RateResponse);    rpc GetRatePoll(RateRequest) returns (stream RateResponse);}

As we can see above this is like the our schema where we define firstly the syntax and we want that to be proto3, any entity we would like to define in a proto file is called "message" so we have 2 message entities which are RateRequest which has a From and To basically 2 currencies and another message a RateResponse which returns the rate of conversion between these currencies.

That was for the entities but what about the services we want this currency proto file to have? there are 2 services in the service block defined, one is GetRate which takes in a RateRequest and returns a RateResponse and the other one is kind of interesting, GetRatePoll takes a RateRequest but returns a stream of data not an instantaneous response, once the data has all been transferred the cycle is complete. We define services buy the keyword rpc before it and take care that returns has an s and not just return like any programming language.

Now this file is language neutral, it's the same syntax and will never change across different languages, but what we need to do is actually compile the above file to match our language of use, since I'm using go and have installed protoc which is a requirement to be able to compile the file above as well as installing the go plugin for protobuf

go install google.golang.org/protobuf/cmd/protoc-gen-go@latest

. Now all we need to do is to run the following command assuming the .proto is located in the directory protos.

protoc --go_out=. --go_opt=paths=source_relative \                   --go-grpc_out=. --go-grpc_opt=paths=source_relative \    protos/currency.proto

I specified the go_out option to be go because that's what I'm using.

This generates 2 files

currency_grpc.pb.gocurrency.pb.go

These files are a language specific implementation to the proto file specified. They include all code necessary and you just use the code generated.

Now that we have successfully generated the code, we haven't yet implemented the actual services GetRate and GetRatePoll and to be able to do this if we take a look at our currency_grpc.pb.go file we will find an interface that has the 2 method signatures

type CurrencyServer interface {    GetRate(context.Context, *RateRequest) (*RateResponse, error)    GetRatePoll(*RateRequest, Currency_GetRatePollServer) error    mustEmbedUnimplementedCurrencyServer()}

Now let's implement GetRate firstly.What we'll do is create a class that has GetRate as a method and inject that class into the GRPC server which we haven't yet defined, since go doesn't have classes directly we will create a struct with a logger inside it and add methods to this struct.

type Currency struct {    logger *Logger}

And the GetRate method

func (c *Currency) GetRate(ctx context.Context, in *protos.RateRequest) (*protos.RateResponse, error) {    c.logger.InfoLogger("Received a GRPC Request Requesting Currency Rate")    return &protos.RateResponse{        Rate: 0.5,    }, nil}

What we've done is just log the request and return a hard coded response with a rate of 0.5.

Also forgot to mention that I created a Logger struct that has different warnings based on how severe the log is

type Logger struct{}func (*Logger) InfoLogger(message string) {    log.Println("INFO: ", message)}func (*Logger) WarnLogger(message string) {    log.Println("WARNING: ", message)}func (*Logger) ErrorLog(message string) {    log.Panic("ERROR: ", message)}

Now let's take a look at the other method GetRatePoll

func (c *Currency) GetRatePoll(in *protos.RateRequest, stream protos.Currency_GetRatePollServer) error {    rs := protos.RateResponse{        Rate: 0.84,    }    for {        stream.Send(&rs)        time.Sleep(5 * time.Second)    }}

All this does is create a response hard coded rate to 0.84 and keep sending the same response over and over with a 5 sec timeout. this is not bi directional streaming it's only one way as the server streams back a response to a single request. Obviously this will never end but its just for an example, otherwise maybe you'd want to read and send chunks of data until you finish the whole thing.

Initializing the server

Now all that's left is defining the GRPC server and running it so we can connect on it.

import (    "context"    protos "grpc/protos"    "time")func main() {    server, err := net.Listen("tcp", ":9000")    if err != nil {        panic("Unable to listen on port 9000.")    }    logger := currency.Logger{}    cs := currency.NewCurrency(&logger)    grpc := grpc.NewServer()    protos.RegisterCurrencyServer(grpc, cs)    grpc.Serve(server)}

In the main this is all the code required to initialize a GRPC server. We open a normal tcp connection on port 9000, initialize a logger which we send to the NewCurrency function which is a function that simply returns an instance of the currency server here's the implementation of it.

func NewCurrency(logger *Logger) *Currency {    return &Currency{logger: logger}}

It just takes a logger pointer and returns a new currency struct. A Factory pattern if you think of it.

We run grpc.NewServer() which initiates a grpc server.

Then we invoke the protos RegisterCurrencyServer which is found in the package grpc/protos Passing in the server and the currency struct.

Finally we run grpc.Serve and pass in the tcp server.We can register multiple servers of different context other than currency using RegisterCurrencyServer

Testing the server

Now client side GRPC is possible by implementing client methods from the generated GRPC code, however there's this useful tool called grpcCurl that is similar to the normal curl except it allows us to connect to GRPC endpoints. Here's the link for it.

From your terminal run

grpcurl -plaintext localhost:9000 Currency.GetRate

This will invoke the Currency.GetRate service that we defined and this is how it looks like

~| grpcurl -plaintext localhost:9000 Currency.GetRate{  "Rate": 0.5}

Now invoking the Currency.GetRatePoll yields us an infinite stream of responses separated by 5 seconds each.

~| grpcurl -plaintext localhost:9000 Currency.GetRatePoll{  "Rate": 0.84}{  "Rate": 0.84}{  "Rate": 0.84}{  "Rate": 0.84}

And that's all for this topic. I enjoyed studying this and tried to explain it as much as I could, however this is just scratching the surface of GRPC and the examples shown were not a full follow on tutorial it was just a demo showcasing how everything works together.

Hope anyone found this useful and see you next article!

Concurrency and Thread Synchronization in Go

Amr Elhewy — Wed, 16 Feb 2022 14:45:09 GMT

Today I will talk about concurrency and thread synchronization using Golang which recently I started learning.

What is concurrency?

Concurrency is when two or more tasks can run in overlapping time periods, which doesn't mean that they run literally at the same time, what this means is if a process or a task is run on a single core, say it has for example 2 threads, concurrency is when you get the illusion that they are working at the same time when what really happens is one thread starts then stops and the other thread works and so on so at most 1 thread works in a given time but because the time is minimal between switching us humans never notice it. When 2 threads run at the exact same time that is known as parallelism, such as a multi core processor where threads from different cores execute tasks at the same time.

For example take this task as an example, the gopher(small dude) is burning manuals one at a time in the image belowHere's what doing them concurrently look like where both gophers work together without working at the same time exactly.

Also worth noting that Parallelism is why there are 2 duplicates of the same image which means that 2 gophers work at the same exact time in a given time.

Concurrency in Go

Now that we have a brief understanding of what concurrency is let's take a look at a example using Golang.

 func main() {    Print("Duck")    Print("Bat")}func Print(name string) {    for i := 0; i < 5; i++ {        fmt.Println(name)        time.Sleep(time.Second)    }}

Take the example above as a example to start with, what it does till now is just print the words duck and bat 5 times waiting 1 second before each time, if we run the snippet above the output would be the following

DuckDuckDuckDuckDuckBatBatBatBatBat

Where it finished the first function invocation and went straight to the following one, this is a synchronous flow

Goroutines

Simply a goroutine is a lightweight thread of execution, really lightweight that you can run hundreds of thousands of them at the same time where go handles them internally.They are really easy to use simply add the keyword go before invoking a function.Let's update the example above so we can run both print functions in 2 go routines

func main() {    go Print("Duck")    go Print("Bat")}func Print(name string) {    for i := 0; i < 5; i++ {        fmt.Println(name)        time.Sleep(time.Second)    }}

Executing the code above would yield us empty output. Why is that? because since we ran 2 separate go routines the main go routine of the app which has the main function in it terminated before our go routines had the chance to finish. In Go when the main function finishes execution everything else running gets terminated.To combat this for now we can use a Scan at the end of the main to block it from terminating.

func main() {    go Print("Duck")    go Print("Bat")    fmt.Scanln()}func Print(name string) {    for i := 0; i < 5; i++ {        fmt.Println(name)        time.Sleep(time.Second)    }}

Running this will yield the following output:

DuckBatDuckBatDuckBatBatDuckDuckBat

As we can see both goroutines execute concurrently while we may not notice a difference in this example but for more complex structures this increases performance and reduces the time if used correctly.

This is all fine till now but using Scanln() to block the main isn't really the best approach for preventing the main function from terminating. That's why we can use a WaitGroup which is built in Golang and helps us wait for any occurring go routines to finish before we proceedNow updating the code above using the WaitGroup will be as follows:

func main() {    var wg sync.WaitGroup    wg.Add(2)    go Print("Duck", &wg)    go Print("Bat", &wg)    wg.Wait()}func Print(name string, wg *sync.WaitGroup) {    for i := 0; i < 5; i++ {        fmt.Println(name)        time.Sleep(time.Second)    }    wg.Done()}

All that happened here is we initialized a wg variable of type WaitGroup and Using the Add method which specifies the amount of goroutines that we should wait on before proceeding.And in every goroutine we invoke the Done method with basically tells the wait group that this specific Goroutine has finished.Finally using .Wait() in the main function is what makes main wait until Done method has been called on every goroutine it was waiting on.So far we talked about concurrency and how go routines work but we haven't yet talked about Thread Synchronization, what if I want both go routines communicate with each other? or maybe on go routine communicate with the main go routine.In Go we use something called channels to enable communication between different threads which is basically a meeting point where when both threads execute they meet at a certain point to exchange data.Let's explain it a bit more in the code snippet below

func main() {    go Print("Duck")}func Print(name string) {    for i := 0; i < 5; i++ {        fmt.Println(name)        time.Sleep(time.Second)    }}

Instead of Printing name in the function what if I want to communicate with the main go routine and maybe print it there. how do I pass data from the function Print to the main when Print is ran on a different go routine?

func main() {    c := make(chan string)    go Print("Duck", c)    name := <-c    fmt.Println(name)}func Print(name string, c chan string) {        c <- name        close(c)}

In the code above we created a channel c and passed it to the Print function.In the Print function the line c<- name, the arrow indicates the direction of the data flow either from or to the channel. In this case the data is flowing into the channel, so what happens here is for every name in the function Print it is passed to the channel.On the other hand in main the line name:= <-c indicates data is flowing from the channel to the variable called name hence we can print it there.Remember the meeting point we talked about? it's basically both of these lines mentioned above. Whenever any goroutine reaches it first it gets blocked until the second one reaches it so they can communicate and resume what they were doing.A real life example would be meeting a friend at a certain place, you arrived early and you're waiting for him to arrive so you can maybe resume whatever you were going to do.This is how channels work in go, This is how they synchronize at certain point of time.The close function closes the channel we finish sending all the data we need, this is to prevent a deadlock that occurs when the receiver blocks and keeps waiting when there is nothing to send so after we send all the data we close the channel.This is just a brief about channels, there are also Buffered Channels which don't block until the channel is filled with data, buffered channels take a second parameter in the Make function which is the size of the buffer, the snippet below explains this more.

func main() {    c := make(chan string, 1)    c <- "duck"    name := <-c    fmt.Println(name)}

This snippet creates a buffered channel of size 1 and adds a string to it, and I receive it on the same go routine, this would print the name as duck normally, whereas if I didn't add the extra buffer size in the make function I would get a deadlock as the c <- "duck" line blocks waiting for a receiver. This is just a brief explanation of channels and for more information checkout the main go tour's Concurrency Chapter

Mutex and locking

In the last section of this article I would also like to quickly explain mutex locking in go and why we need it.Sometimes we need multiple go routines to access the same memory space at the same time, for example the famous bank account example where if the balance was already 100$, one go routine adds 10$ and the other subtracts 20$, if not handled correctly the user might be left with 80$ instead of 100$, How is that? since both are happening at the same time the go routine that subtracts is unaware that there is another goroutine that's adding to the current balance, so it takes 100$ as the current balance and while it's processing the balance might have changed to add the extra 10$ but it was unaware and it overrode the current balance to be 80$, so the user initially just lost 10$ in the process.What is a mutex? Mutex is short for mutual exclusion where upon entering a certain section of a code called the critical section a mutex makes sure that no other go routine can enter this section while it's being processed by a current go routine already.Let's try to explain this with some code.

var (    mutex   sync.Mutex    balance int)func deposit(value int, wg *sync.WaitGroup) {    mutex.Lock()    old_balance := balance    fmt.Printf("Despositing %d$ ...\n", value)    balance = value + old_balance    mutex.Unlock()    wg.Done()}func withdraw(value int, wg *sync.WaitGroup) {    mutex.Lock()    old_balance := balance    time.Sleep(5 * time.Millisecond)    fmt.Printf("Withdrawing %d$ ...\n", value)    balance = old_balance - value    mutex.Unlock()    wg.Done()}func main() {    balance = 1000    var wg sync.WaitGroup    wg.Add(2)    go deposit(100, &wg)    go withdraw(200, &wg)    wg.Wait()    fmt.Println("The balance now is", balance)}

This snippet has 2 functions deposit and withdraw and they both start with taking the initial balance which is 1000$, the withdraw function sleeps 5 ms in the middle which by then the deposit would have finished already but it doesn't see the change so when the sleep finishes it keeps going thinking the initial value is still 1000$, which changed due to deposit function but then proceeds to override that value which would cause the user to lose 100$.What the mutex lock here does is as soon as we run the code, whichever goroutine acquires the lock first starts and since the other goroutine has nothing to process and needs the lock to proceed, it blocks waiting for the lock to be released from the first goroutine, when the first goroutine finishes it releases the lock where the second goroutine acquires it and begins reading the balance which was already updated, this process prevents lots of the race conditions that occur due to different threads trying to access the same memory space.

If you made it to here congrats you now have a basic understanding of concurrency and thread synchronization using Go, I hope i explained this well and if there are any comments please let me know!Thank you.