Project #1: A Simple Twitter Client/Server System

Due: Thursday, November 4th at 11:59pm

This project is intended to serve as introduction to concurrent objects and other low-level primitive objects.

Note

There will be aspects of this assignment that are not the most efficient way to implement various components. It will be your job in a future assignment to analyze the implementation of this project and think about how components can be improved or aspects removed to speedup performance. Keep this in mind while working on this project.

Getting started

For each assignment, a Git repository will be created for you on GitHub. However, before that repository can be created for you, you need to have a GitHub account. If you do not yet have one, you can get an account here: https://github.com/join.

To actually get your private repository, you will need this invitation URL:

When you click on an invitation URL, you will have to complete the following steps:

  1. You will need to select your CNetID from a list. This will allow us to know what student is associated with each GitHub account. This step is only done for the very first invitation you accept.

Note

If you are on the waiting list for this course you will not have a repository made for you until you are admitted into the course. I will post the starter code on Ed so you can work on the assignment until you are admitted into the course.

  1. You must click “Accept this assignment” or your repository will not actually be created.

  2. After accepting the assignment, Github will take a few minutes to create your repository. You should receive an email from Github when your repository is ready. Normally, it’s ready within seconds and you can just refresh the page.

  3. You now need to clone your repository (i.e., download it to your machine).
    • Make sure you’ve set up SSH access on your GitHub account.

    • For each repository, you will need to get the SSH URL of the repository. To get this URL, log into GitHub and navigate to your project repository (take into account that you will have a different repository per project). Then, click on the green “Code” button, and make sure the “SSH” tab is selected. Your repository URL should look something like this: git@github.com:mpcs52060-aut21/proj1-GITHUB-USERNAME.git.

    • If you do not know how to use git clone to clone your repository then follow this guide that Github provides: Cloning a Repository

If you run into any issues, or need us to make any manual adjustments to your registration, please let us know via Ed Discussion.

Assignment: Single-User Twitter Feed

For this assignment you are only allowed to use the following Go concurrent constructs:

  • go statement

  • sync.Mutex and its associated methods.

  • sync/atomic package. You may use any of the atomic operations.

  • sync.WaitGroup and its associated methods.

You cannot use Go channels (i.e., chan) or anything related to channels in this assignment! If you are unsure about whether you are able to use a language feature then please ask on Ed before using it.

Part 1: Twitter Feed

Imagine you are a new software engineer at Twitter and your first assignment is to redesign the data structure that represents a user’s feed. Your implementation will redefine it as a linked list. If you are unaware about linked lists then I recommend you look over this resource:

Your task to implement the remaining incomplete methods of a feed (i.e. the Add, Remove, and Contains methods). You must use the internal representations for type feed struct and and type post struct in your implementation. Do not implement a parallel version of the linked list at this point. You can only add fields to the struct but cannot modify the original fields given.

Test your implementation of feed by using the test file called feed_test.go. You should only run the sequential tests first:

  • TestSimpleSeq

  • TestAdd

  • TestContains

  • TestRemove

As a reminder from prior assignments, you can run individual tests by using the -run flag when using go test. This flag takes in a regular expression to know which tests to run. Make sure are you in the directory that has the *_test.go file.

Sample run of the SimpleSeq test:

//Run top-level tests matching "SimpleSeq", such as "TestSimpleSeq".
$ go test -run "SimpleSeq"
PASS
ok    proj1/feed      0.078s

Sample run of the SimpleSeq and TestAdd tests:

//Run top-level tests matching "SimpleSeq" such as "TestSimpleSeq" and "TestAdd".
$ go test -v -run "SimpleSeq|TestAdd"
=== RUN   TestSimpleSeq
--- PASS: TestSimpleSeq (0.00s)
=== RUN   TestAdd
--- PASS: TestAdd (0.00s)
PASS
ok    proj1/feed      0.078s

Part 2: Thread Safety using a Read-Write Lock

A read/write lock mechanism allows multiple readers to access a data structure concurrently, but only a single writer is allowed to access the data structures at a time. Implement a read/write lock library that only uses a single condition variable and mutex for its synchronization mechanisms. Go provides a Read/Write lock that is implemented using atomics:

As with the Go implementation, you will need to provide four methods associated with your lock:

  • Lock()

  • Unlock()

  • RLock()

  • RUnlock()

These methods should function exactly like their Go counterparts. Documentation on each method’s functionality is described by the link provided above. You must limit the max number of readers to 64.

Coarse Grain Feed

Now, go back to your feed library inside of feed.go and make it thread-safe by using your implementation of a read/write lock. You need to think about the appropriate places to call the various read/write locking and unlocking methods in the feed methods. This must be a coarse grain implementation. Test that this is working by running the remaining tests inside the feed_test.go file.

Helpful Resource(s):

Part 3: A Twitter Server

Using the feed library from Part 1, you will implement a server that processes requests, which perform actions (e.g., add a post, remove a post, etc.) on a single feed. These requests come from a client program (e.g., the twitter mobile app) where a user can request to modify their feed. The server sends responses back to the client with a result (e.g., notification that a post was added and/or removed successfully, etc.).

The client and server must agree on the format to send the requests and responses. One common format is to use JSON. For our server, requests and responses are single strings in the JSON format. If you are unfamiliar with the JSON standard then you should read up on it before beginning this portion of the assignment. JSON is a widely used serialization format that you will mostly encounter at some point in your career. Thus, it’s better that you learn it now so you can be comfortable with it in the future. How to work with JSON data inside Go is described here:

Make sure you fully understand the “Streaming Encoders and Decoders” section. You’ll be using encoders and decoders in this part heavily.

The client and server will need to use decoders and encoders to parse the JSON string into a type easily usable in Go. You will use the json.Decoder, which acts as a streaming buffer of requests, and json.Encoder, which acts as a streaming buffer of responses. This model is a simplified version of a real-life client-server model used heavily in many domains such as web development. At a high-level, you can think of your program as a simple “server” in the client-server model illustrated below:

../../_images/cs_model.png

Requests (i.e., tasks in our program) are sent from a “client” (e.g., a redirected file on the command line, a task generator program piped into your program, etc.) via os.Stdin. The “server” (i.e., your program) will process these requests and send their results back to the client via os.Stdout. This model is a mimicking a real-life client-server model used heavily in many domains such as web development; however, we are not actually implementing web client-server system in this assignment.

Requests and Responses Format

The basic format for the requests coming in from json.Decoder will be of the following format:

{
"command": string,
"id": integer,
... data key-value pairings ...
}

A request will always have a "command" and "id" key. The "command" key holds a string value that represents the type of feed task. The "id" represents an unique identification number for this request. Requests are processed asynchronously by the server so requests can be processed out of order from how they are received from json.Decoder; therefore, the "id" acts as a way to tell the client that result coming back from the server is a response to an original request with this specific "id" value. Thus, it is not your responsibility to maintain this order and you must not do anything to maintain it in your program.

The remaining key-value pairings represent the data for a specific request. The following subsections will go over the various types of requests.

Add Request

An add request adds a new post to the feed data structure. The "command" value will always be the string "ADD". The data fields include a key-value pairing for the message body ("body": string) and timestamp ("timestamp": number). For example,

{
  "command": "ADD",
  "id": 342,
  "body": "just setting up my twttr",
  "timestamp": 43242423
}

After completing a "ADD" request, the goroutine assigned the request will send a response back to the client via json.Encoder acknowledging the add was successful. The response is a JSON object that includes a success key-value pair ("success": boolean). For an add request, the value is always true since you can add an infinite number of posts. The original identification number should also be included in the response. For example, using the add request shown above, the response message is

{
  "success": true,
  "id": 342
}

Remove Request

A remove request removes a post from the feed data structure. The "command" value will always be the string "REMOVE". The data fields include a key-value pairing for the timestamp ("timestamp": number) that represents the post that should be removed. For example,

{
  "command": "REMOVE",
  "id": 2361,
  "timestamp": 43242423
}

After completing a "REMOVE" task, the goroutine assigned the task will send a response back to the client via json.Encoder acknowledging the remove was successful or unsuccessful. The response is a JSON object that includes a success key-value pair ("success": boolean). For a remove request, the value is true if the post with the requested timestamp was removed, otherwise assign the key to false. The original identification number should also be included in the response. For example, using the remove request shown above, the response message is

{
  "success": true,
  "id": 2361
}

Contains Request

A contains request checks to see if a feed post is inside the feed data structure. The "command" value will always be the string "CONTAINS". The data fields include a key-value pairing for the timestamp ("timestamp": number) that represents the post to check. For example,

{
  "command": "CONTAINS",
  "id": 2362,
  "timestamp": 43242423
}

After completing a "CONTAINS" task, the goroutine assigned the task will send a response back to the client via json.Encoder acknowledging whether the feed contains that post. The response is a JSON object that includes a success key-value pair ("success": boolean). For a contains request, the value is true if the post with the requested timestamp is inside the feed, otherwise assign the key to false. The original identification number should also be included in the response. For example, using the contains request shown above, the response message is

{
  "success": false,
  "id": 2362
}

Note: Assuming we removed the post previously.

Feed Request

A feed request returns all the posts within the feed. The "command" value will always be the string "FEED". Their are no data fields for this request. For example,

{
  "command": "FEED",
  "id": 2,
}

After completing a "FEED" task, the goroutine assigned the task will send a response back to the client via json.Encoder with all the posts currently in the feed. The response is a JSON object that includes a success key-value pair ("feed": [objects]). For a feed request, the value is a JSON array that includes a JSON object for each feed post. Each JSON object will include a "body" key ("body": string) that represents a post’s body and a "timestamp" key ("timestamp": number) that represents the timestamp for the post. The original identification number should also be included in the response. For example, assuming we inserted a few posts into the feed, the response should look like

{
  "id": 2
  "feed":[
        {
          "body": "This is my second twitter post",
          "timestamp": 43242423
        },
        {
          "body": "This is my first twitter post",
          "timestamp": 43242420
        }
        ]
}

Done Request

If client will no longer send requests then it sends a done request. The "command" value will always be the string "DONE". Their are no data fields for this request. For example,

{
  "command": "DONE"
}

This notifies server it needs to shutdown (i.e., close down the program). A done request signals to the main goroutine that no further processing is necessary after this request is received. No response is sent back to the client. Make sure to handle all remaining requests in the and responses before shutting down the program.

Implementing the Server

Inside the server/server.go file, you will see the following code,

type Config struct {
  Encoder *json.Encoder // Represents the buffer to encode Responses
  Decoder *json.Decoder // Represents the buffer to decode Requests
  Mode    string        // Represents whether the server should execute
                        // sequentially or in parallel
                        // If Mode == "s"  then run the sequential version
                        // If Mode == "p" then run the parallel version
                        // These are the only values for Version
  ConsumersCount int    // Represents the number of consumers to spawn
  BlockSize      int    // Represents the maximum number of tasks a consumer
                        // can process at any given point in time.
}

//Run starts up the twitter server based on the configuration
//information provided and only returns when the server is fully
// shutdown.
func Run(config Config) {
  panic("TODO")

}

When a goroutine calls the Run function, it will start the server based on the configuration passed to the function. Read over the documentation above to understand the members of the configuration struct. This function does not return (i.e., it is a blocking function) until the server is shutdown (i.e., it receives the "DONE" request). You must not modify the Config struct or the function header for the Run function because the tests rely on this structure. The following sections explain the modes of the server.

Parallel Version: Tasks Queue

Inside the Run function, if config.Mode == "p" then the server will run the parallel version. This version is implemented using a task queue. This task queue is another work distribution technique and your first exposure to the producer-consumer model. In this model, the producer will be the main goroutine and its job is to collect a series of tasks and place them in a queue structure to be executed by consumers (also known as workers). The consumers will be spawned goroutines. You must implement the parallelization as follows:

  1. The main goroutine begins by spawning a specified config.ConsumersCount goroutines, where each will begin executing a function called func consumer(...). It is up to you to decide the arguments you pass to this function. Each goroutine will either begin doing work or go to sleep in a conditional wait if there is no work to begin processing yet. This “work” is explained in Steps 3 and 4. Your program cannot spawn additional goroutines after this initial spawning by the main goroutine.

  2. After spawning the consumer goroutines, the main goroutine will call a function func producer(...). Again, what you pass to this function is for you to decide. Inside the producer function, the main goroutine reads in from json.Decoder a series of tasks (i.e., requests). For the sake of explicitness, the tasks will be feed operations for a single user-feed that the program manages. The main goroutine will place the tasks inside of a queue data structure and do the following:

    • If there is at least one consumer goroutine waiting for work then place a task inside the queue and wake one consumer up.

    • Otherwise, the main gorountine continues to place tasks into the queue. Eventually, the consumers will grab the tasks from the queue at later point in time.

  1. Inside the func consumer(...) function each consumer goroutine will try to grab at least a config.BlockSize amount of tasks from the queue before processing any task. If there is less than a config.BlockSize amount of tasks in the queue then the consumer grabs all the tasks from the queue and starts to execute them. When a consumer finishes executing its block of tasks, it checks the queue to grab another config.BlockSize amount of tasks. If there are no tasks in the queue then it will need to wait for more tasks to process or exit its function if there are no remaining tasks to complete.

Additional Queue Requirements

You must implement this queue data structure so that both the main and worker goroutines have access to retrieve and modify it. All work is placed in this queue so workers can grab a block of tasks when necessary. Along with the requirements defined in this section, the actual enqueuing and dequeuing of items must also be done in a unbounded lock-free manner (i.e., non-blocking). However, the code to make the producer signal to consumers, and consumers to wait on work must be done using a condition variable.

You may want to separate out the queue implementation into its own package and then have server.go import it. This design is up to you. However, the producer and consumer functions must always remain in server.go. I will also allow you to separate out the producer/consumer condition variable code from the unbounded lock-free queue code. Again, this is for you to decide.

Sequential Version

You will need to write a sequential version of this program where the main goroutine processes and executes all the requested tasks without spawning any gorountines.

Part 4: The Twitter Client

Inside the twitter/twitter.go, you must define a simple Twitter client that has the following usage and required command-line argument:

Usage: twitter <number of consumers> <block size>
    <number of consumers> = the number of goroutines (i.e., consumers) to be part of the parallel version.
    <block size> = the maximum number of tasks a goroutine can process at any given point in time.

The program needs to create a server.Config value based off the above arguments. If <number of consumers> and <block size> is not entered then this means you need to run your sequential version of the server; otherwise, run the parallel version. The json.Decoder and json.Encoder should be created by using os.Stdin and os.Stdout. Please refer back to the “Streaming Encoders and Decoders” section here: JSON in Go. The last call in the main function is to start running the server. Once the Run function returns than the program exits. Don’t over think this implementation. It should be simple and short.

Assumptions: No error checking is needed. All tasks read-in will be in the correct format with all its specified data. All command line arguments will be given as specified. You will not need to printout the usage statement. This is shown for clarity purposes.

Sample Files/Test Cases

You may want to create a few sample files with a few tasks within them (all on separate lines). For example, you could create a file called tasks.txt with the following contents:

{"command": "ADD", "id": 1, "body": "just setting up my twttr", "timestamp": 43242423}
{"command": "ADD", "id": 2, "body": "Another post to add", "timestamp": 43242421}
{"command": "REMOVE", "id": 3, "timestamp": 43242423}
{"command": "DONE"}

You could then use file redirection to supply those files to your twitter program as such:

$ go run twitter.go 2 1 < tasks.txt

Note you are not reading from a file in this assignment! You are using file redirection that is part of terminal to supply os.Stdin with the contents of the file specified on the command line.

I will supply you with a few test cases at the beginning of the Week 5 to help you test for correctness. However, as programmers, I want you to be able to test your programs yourself before I provide you with additional test cases.

Part 5: Benchmarking Performance

In Part 5, you will test the execution time of your parallel implementation by averaging the elapsed time of the twitter tests. You are required to run the timings on the CS Peanut cluster. We have provided you with a SLURM script that has most of the configuration setup for you. Follow these steps to benchmark your twitter program:

  1. Commit all your work to your project repository.

  2. Login to the CS remote server and git clone your repository in your home directory.

    Note

    The course website provides documentation for logging in remotely (I would highly recomemnd using the VSCode way):

    If you prefer a visual login mechanism then Techstaff has also setup a Virtual Desktop that mimics being at a CS machine in CSIL:

  3. Read over the SLURM overview created here:

    Read from the beginning until the Interactive Jobs section. If you are still confused about SLURM then we will cover it during the M5 synchronous session (when available watch that video over again).

  4. Change directories into the grader directory (i.e., proj1/grader), and inside you’ll see the benchmark-proj1.sh file. Here are the configurations flags you will need to change:

    • --mail-user=CNETID@cs.uchicago.edu : Change this to the email address you would like to get notified when your job has completed. It can be any email address.

    • --chdir=ABSOLUTE_PATH_TO_PROJ1_GRADER_DIRECTORY: Remove the ABSOLUTE_PATH_TO_PROJ1_GRADER_DIRECTORY text and place the absolute path to your project 1 grader directory. You can get this easily by run the pwd command in the terminal window. Make sure you are inside the proj1/grader directory when you run pwd.

  1. Inside the proj1/grader directory, make a directory called slurm (i.e., run mkdir slurm) and inside the newly created slurm directory make a directory called out (i.e., run mkdir out inside slurm). This is where the output for you job will be placed.

  2. You can now submit your job by running sbatch benchmark-proj1.sh.

Grading for Part #5

An “A” grade for this project requires a high performing solution; therefore, the breakdown for grading part 5 is as follows:

  • Full points (13 points): You must complete the following two requirements to get full points:

    1. You must pass all tests (i.e., by running go run proj1/grader proj1).

    2. The elapsed time for a single run of the tests inside twitter directory (go test -v in proj1/twitter) in seconds must be at most 150 seconds The elapsed time is shown as the last line produced by the tests cases:

      PASS
      ok      proj1/twitter   103.976s
      

      After running it a few times on --partition=debug and seeing consistent results, switch over to running it on --partition=general and add the --exclusive flag. The benchmark-proj1.sh only has the single line go test proj1/twitter -v -count=1 but when grading we will run the tests 5 times and consider the average time when running it using the --exclusive flag.

  • Partial points (5 points): You must complete the following two requirements to get partial credit:

    1. You must pass all tests (i.e., by running go run proj1/grader proj1).

    2. The elapsed time for a single run of the tests inside twitter directory (go test -v in proj1/twitter) in seconds must be at most 300 seconds.

  • No points (0 points): No points will be given for this part to solutions that do not pass all tests by running go run proj1/grader proj1 and/or exceed 300 seconds for a single run of the tests inside twitter directory (go test -v in proj1/twitter).

Grading

Programming assignments will be graded according to a general rubric. Specifically, we will assign points for completeness, correctness, design, and style. (For more details on the categories, see our Assignment Rubric page.)

The exact weights for each category will vary from one assignment to another. For this assignment, the weights will be:

  • Completeness: 60%

  • Correctness: 20%

  • Design & Style: 7%

  • Performance: 13%

Obtaining your test score

The completeness part of your score will be determined using automated tests. To get your score for the automated tests, simply run the following from the Terminal. (Remember to leave out the $ prompt when you type the command.)

$ cd grader
$ go run proj1/grader proj1

This should print total score after running all test cases inside the individual problems. This printout will not show the tests you failed. You must run the problem’s individual test file to see the failures.

Your actual completeness score will come from running the grader on the CS Peanut cluster manually by the graders. We have provided a slurm script inside grader/grader-slurm.sh. Follow the steps 1-4 from Part 5: Benchmarking Performance and then run:

sbatch grader-slurm.sh

For this assignment, there will be no autograder on Gradescope. We will run the grader on the CS Peanut cluster and will manually enter in the score into Gradescope. However, you must still submit your final commit to Gradescope.

There is maximum timeout limit of 10 minutes for each test. No credit is given to that specific test if it exceeds that limit. SLURM has a maximum job time for 10 minutes; therefore, you may have to run them on a test by test basis if your go run proj1/grader proj1 takes in total more than 10 minutes. However, you will still get full points as long as each single test run is below 10 minutes, regardless of how long it takes in total.

Design, Style and Cleaning up

Before you submit your final solution, you should, remove

  • any Printf statements that you added for debugging purposes and

  • all in-line comments of the form: “YOUR CODE HERE” and “TODO …”

  • Think about your function decomposition. No code duplication. This homework assignment is relatively small so this shouldn’t be a major problem but could be in certain problems.

Go does not have a strict style guide. However, use your best judgment from prior programming experience about style. Did you use good variable names? Do you have any lines that are too long, etc.

As you clean up, you should periodically save your file and run your code through the tests to make sure that you have not broken it in the process.

Submission

Before submitting, make sure you’ve added, committed, and pushed all your code to GitHub. You must submit your final work through Gradescope (linked from our Canvas site) in the “Project #1” assignment page via two ways,

  1. Uploading from Github directly (recommended way): You can link your Github account to your Gradescope account and upload the correct repository based on the homework assignment. When you submit your homework, a pop window will appear. Click on “Github” and then “Connect to Github” to connect your Github account to Gradescope. Once you connect (you will only need to do this once), then you can select the repsotiory you wish to upload and the branch (which should always be “main” or “master”) for this course.

  2. Uploading via a Zip file: You can also upload a zip file of the homework directory. Please make sure you upload the entire directory and keep the initial structure the same as the starter code; otherwise, you run the risk of not passing the automated tests.

As a reminder, for this assignment, there will be no autograder on Gradescope. We will run the grader on the CS Peanut cluster and will manually enter in the score into Gradescope. However, you must still submit your final commit to Gradescope.