Interpreter 2: Memory management: talloc

1. Pair programming structure
2. Get started
3. The idea
4. Storing the active list
5. Some specifics
6. Optional Extensions
7. Testing and submitting your work
- 7.1. Testing your work
- 7.2. Submitting your work

In the last assignment, you built a linked list, and wrote code that hopefully cleaned up the list appropriately. Perhaps you have been missing the convenience of using a language with a garbage collection system that spares you from having to remember to clean individual things up. For this assignment, we're going to build an exceedingly simple but effective garbage collector. This garbage collector is so inefficient that this may bother some of you; if so, consider improving the garbage collector to be an optional extension that you can think about when the project is complete.

1. Pair programming structure

This is a "pair" assignment, which means that you and your partner should use the pair programming model. You should be physically together, sharing a computer and both looking at the same screen.

Each partner has an active role to play during pair programming - please review the instructions on Moodle about pair programming, and make sure you understand what you should be doing when you are the navigator and when you are the driver. You should make sure that over the course of an assignment that you each spend roughly the same amount of time "driving." My recommendation is to set a timer and switch drivers every 10-15 minute.

For the interpreter project, you are permitted to do a limited amount of debugging separate from your partner. If you do this, you should be sure that you're each contributing similar amounts: it's not fair, either in terms of time or in terms of learning, if one person does all the debugging. Before debuggging on your own, make sure that your partner is onboard with that. Also, make sure to share what you found with your partner afterwards.

If pair programming in real-time just doesn't work for you and your partner, then you will need to amicably split up and work individually. If you choose this option, you must communicate that to me [Anna] and send me an email with a zip file containing the code checkpoint of any work you did together (if applicable).

If things are not going well with working with your partner or you find yourselves tempted to split up the work, please talk to me. While doing a limited amount of debugging separately is fine, you should be doing the initial design and coding together - if you or your partner cannot explain how part of your interpreter works, then you have not followed the pair programming policies for this class.

2. Get started

Download the folder with the starter code from the link below. Like for previous homeworks, unzip the folder and move it to where you'd like: most likely the folder for the Docker container. Then get started in VS Code.

If when running tests you run into a permissions error, you can fix this by running the following code in the terminal:

chmod a+x test-e test-m

This tells the computer that all users should be permitted to execute (run) the files test-e and test-m.

Click here to download the starter files

Before continuing any further, you'll also need to copy these files from your Linked List submission into this assignment's folder:

linkedlist.c
CollaborationsAndSources.txt

3. The idea

You'll be creating your own replacement for malloc, which we'll call talloc (for "track malloc"). For a user, talloc seems to work just like malloc, in that it allocates memory and returns a pointer to it. Inside your code for talloc, you'll need to call malloc to do exactly that. Additionally, talloc should store the pointer to that memory in a linked list that we'll call the "active list" for purposes of discussion. Every time talloc is called, another pointer to memory gets added to that active list.

You'll then also create a function called tfree, which will free up all memory associated with pointers accumulated from calls to talloc. Calling tfree at arbitrary points in your program would be a complete disaster, as it would free up memory that you may still be using. The idea is that we will be using talloc as a replacement for malloc, and then calling tfree at the very end of our main function. You'll then be able to program with the illusion of using a garbage collector, except that the garbage collector never actually kicks in until the program is about to end.

You'll also write the function texit, which is a simple replacement for the built-in function exit. texit calls exit, but calls tfree first. Note that texit should pass the status parameter it gets to exit.

Finally, you'll then modify your linked list from the previous assignment. The function cleanup that you wrote will be eliminated, as it is no longer necessary. You should also modify reverse so that it no longer duplicates data between the two linked lists. When you reverse a list, that should return a new list with a new set of CONS_TYPE Item nodes, but the actual data in that list should not be copied from the old list to the new. This would be a disaster to try to clean up manually, but tfree will handle it easily. This change will make some later aspects of the project much easier. Your linked list code should now exclusively use talloc, and should not use malloc at all.

4. Storing the active list

One issue you'll need to think through is where the variable for the head of the active list should be. In an object-oriented language, this would likely be a private static variable in a memory management class. Oops. You can't make the active list head a local variable in talloc, because tfree wouldn't be able to see it. We could make it a parameter to talloc and tfree, but then the programmer using talloc has to keep track of this, and could conceivably have multiple active lists, which sounds ugly. This is an occasion where global variable makes sense, and so you should use one. A global variable in C is declared outside of any functions. Typically, it is placed near the top of your file, underneath the include statements.

There's one bit of circular logic you've got to untangle. talloc needs to store a pointer (returned by malloc) onto a linked list. Your linked list code, in turn, uses talloc. Rather than trying to make this work in some complex mutually dependent structure, my recommendation is to break the circularity. In your talloc code, the single linked list that you use to store allocated pointers should be a linked list generated via malloc, instead of talloc. That means you'll need to duplicate some of your linked list code. Duplicated code is generally to be avoided, but avoiding this circular nightmare is worth it.

5. Some specifics

After you download the starter files, you should see the following:

schemeitem.h: this defines the SchemeItem structure again. It adds one additional type of item that you'll want to modify your code to deal with, a PTR_TYPE.
linkedlist.h: this is a modification form the previous assignment that removes the function cleanup, and also changes the documentation on reverse to indicate that data is not to be copied.
talloc.h: this defines the functions that you'll need to write from scratch for this assignment.
main.c: this is a tester function.
justfile: contains instructions for the command just, which will compile and test your code
test-e and test-m: usual
test_utilities.py: helper utilities used by test-e and test-m

The missing files here are linkedlist.c and talloc.c. You should have already copied over linkedlist.c from the previous assignment (then, you'll make the additional changes required by this assignment) and you should create talloc.c from scratch yourself.

To compile your code, issue the command just build at the command prompt. This will follow the instructions in the just file for building your project in order to produce an executable called linkedlist. At first, it won't build at all because your talloc.c and linkedlist.c files aren't there. To get started, copy in linkedlist.c (remove the cleanup function), and for now, within talloc.c just create every function that you need with no code inside it so that you can get everything to build. Once you have done that, you can begin implementing your functions, and testing appropriately.

The tester code creates an executable that you can run by typing ./linkedlist. The easiest way to run the tests is to use ./test-m and ./test-e, as usual, which will automatically compile all of your code and run the ./linkedlist executable for you.

Your code should have no memory errors when running on any input (correct or incorrect) using valgrind. The testing scripts will automatically run valgrind on your code, and show you if there are memory errors.

6. Optional Extensions

Work in this section is 100% optional, and not worth any points. Nonetheless, if you're looking for an extra challenge (or a good way to prepare for an exam), these are fun additional exercises to try.

Write a function that that can report a count, in bytes, of how much memory is being used in total by this talloc technique. It should include both all of the memory that the user asked for when calling talloc, but also all of the additional overhead in Value structs that talloc creates for assembling a linked list. Here is a signature for that function. You can add it to your talloc.h file:

int tallocMemoryCount();

7. Testing and submitting your work

7.1. Testing your work

As with previous assignments, there are M tests and E tests. See the section of Scheme Intro 2 labeled "How to test your work" if you need a refresher on how to run these tests. Note that the interpreter tests will fail if your code has compiler warnings and/or valgrind errors. Make sure to run the tests in the Docker environment.

7.2. Submitting your work

7.2.1. Acknowledging help

You should already have a file CollaborationsAndSources.txt in your working directory (if not, copy it over from the previous assignment). Go to the header for this portion of the assignment and fill in the document. Did you share strategies with anyone else? Talk about any annoying errors and get advice? These are fine things to do, and you should note them in the CollaborationsAndSources.txt file. Give the names of people you talked with (other than your assigned partner) and some description of your interactions. If you used any resources outside of notes taken in class or the homework/lab starter files that Anna provided, that is also something to note.

If you didn't talk with anyone or use any outside sources, please note that explicitly in CollaborationsAndSources.txt.

7.2.2. Generating the submission file and uploading to Gradescope

After completing CollaborationsAndSources.txt and finishing the assignment, run ./zipitup (if you get a permissions error, run chmod u+x ./zipitup first). This script will create a zip file XXX_submission.zip where XXX is the assignment name or number. Note that this zip file with include your C files, any new header files you created, and CollaborationsAndSources.txt. We will re-copy in our version of the tests and of the provided header files. Upload the zip file to Gradescope.

On Gradescope, make sure to add your partner if you worked with a partner. This page has detailed instructions for how to do that.

On Gradescope, you should see autograder output for the same tests as you ran locally. There is a small chance that the tests will fail on Gradescope, even if they passed on your own computer. Usually this means that you coded something in a different way than we expected when writing the autograders. We will be grading based on the results of the tests on Gradescope, so make sure to pay attention to the results of the tests there. (If you run into this issue, check that you are submitting to the correct assignment on Gradescope. Then try resubmitting and rerunning the autograder. If the error messages are legible, feel free to debug further, but it's also fine to reach out to Anna or the graders at this point.)

7.2.3. Grading

You'll earn at least an M if the following conditions are all met:

All M tests pass
A visual inspection of your code shows that you have not hyper-tailored your code to pass the tests. Hyper-tailoring your code would be doing things like checking for the exact input from the test, meaning you haven't written your code in a way that would work for similar inputs.
You have a CollaborationsAndSources.txt file with the information described above.
Your code has significant stylistic or efficiency issues that we have not explicitly covered in class or on the feedback for a previous assignment.

You'll earn an E if:

All conditions above for an M are met.
All the E tests pass.
Your code is not significantly more complex than needed to accomplish the task.
You have a comment before each function describing what the function does (its input/output behavior, not a play-by-play of "first checks… then makes a recursive call …").
Your code follows good style convention.

If your code does not meet the criteria for an M or an E, then you'll earn a grade of NA (Not Assessable) or SP (Some Progress) for the assignment. Not Assessable means that your submission does not have an attempt at one or more functions in the assignment. Some Progress means that your submission has some code (in the correct language) for all functions in the assignment, but that the code does not pass all of the M tests, is hyper-tailored to the tests, and/or is missing a CollaborationsAndSources.txt file.

You are welcome to revise your submission until the revision deadline (see Gradescope).

If you revise your submission, double-check to make sure that the autograder tests pass and that you have included CollaborationsAndSources.txt.

Good luck, ask questions, and have fun!

Assignment based off one by Dave Musicant. Thanks for sharing!