Interpreter 2: Memory management: talloc

1. Pair programming structure
2. Get started
3. The idea
4. Storing the active list
5. Some specifics
6. Optional Extensions
7. Testing and submitting your work
- 7.1. Testing your work
- 7.2. Submitting your work

In the last assignment, you built a linked list, and wrote code that hopefully cleaned up the list appropriately. Perhaps you have been missing the convenience of using a language with a garbage collection system that spares you from having to remember to clean individual things up. For this assignment, we're going to build an exceedingly dumb but effective garbage collector. This garbage collector is so inefficient that this may bother some of you; if so, consider improving the garbage collector to be an optional extension that you can think about when the project is complete.

1. Pair programming structure

This is a "pair" assignment, which means that if you are working on a team with someone else, you and your partner should engage in the pair programming model. You should be physically together, sharing a computer and both looking at the same screen. Each partner has an active role to play during pair programming - please review the instructions on Moodle about pair programming, and make sure you understand what you should be doing when you are the navigator and when you are the driver.

You should make sure that over the course of an assignment that you spend roughly the same amount of time each "driving." My recommendation is to take turns approximately every 15 minutes or so. Set a timer to help you remember. I will also ask you to fill out a survey about your partnership at the end of the term.

For the interpreter project, you are permitted to do a limited amount of debugging separate from your partner. If you do this, you should be sure that you're each contributing similar amounts: it's not fair, either in terms of time or in terms of learning, if one person does all the debugging. Before debuggging on your own, make sure that your partner is onboard with that. Also, make sure to share what you found with your partner afterwards.

If pair programming in real-time just doesn't work for you and your partner, then you will need to amicably split up and work individually. If you choose this option, you must communicate that to me [Anna] and send me an email with a zip file containing the code checkpoint of any work you did together (if applicable).

If things are not going well with working with your partner or you find yourselves tempted to split up the work, please talk to me. While doing a limited amount of debugging separately is fine, you should be doing the initial design and coding together - if you or your partner cannot explain how part of your interpreter works, then you have not followed the pair programming policies for this class.

2. Get started

You'll download the folder with the starter code from the link below. Like for previous homeworks, unzip the folder and move it to where you'd like: most likely the folder for the Docker container. Then get started in VS Code.

If when running tests you run into a permissions error, you can fix this by running the following code in the terminal:

chmod a+x test-e test-m

This tells the computer that all users should be permitted to execute (run) the files test-e and test-m.

Click here to download the starter files

Before continuing any further, you'll also need to copy these files from your Linked List submission into this assignment's folder:

linkedlist.c
CollaborationsAndSources.txt

3. The idea

You'll be creating your own replacement for malloc, which we'll call talloc (for "track malloc"). For a user, talloc seems to work just like malloc, in that it allocates memory and returns a pointer to it. Inside your code for talloc, you'll need to call malloc to do exactly that. Additionally, talloc should store the pointer to that memory in a linked list that we'll call the "active list" for purposes of discussion. Every time talloc is called, another pointer to memory gets added to that active list.

You'll then also create a function called tfree, which will free up all memory associated with pointers accumulated do to calls to talloc. Calling tfree at arbitrary points in your program would be a complete disaster, as it would free up memory that you may still be using. The idea is that we will be using talloc as a replacement for malloc, and then calling tfree at the very end of our main function. You'll then be able to program with the illusion of using a garbage collector, except that the garbage collector never actually kicks in until the program is about to end.

You'll also write the function texit, which is a simple replacement for the built-in function exit. texit calls exit, but calls tfree first. Note that texit should pass the status parameter it gets to exit.

Finally, you'll then modify your linked list from the previous assignment. The function cleanup that you wrote will be eliminated, as it is no longer necessary. You should also modify reverse so that it no longer duplicates data between the two linked lists. When you reverse a list, that should return a new list with a new set of CONS_TYPE SchemeItem nodes, but the actual data in that list should not be copied from the old list to the new. This would be a disaster to try to clean up manually, but tfree will handle it easily. This change will make some later aspects of the project much easier. Your linked list code should now exclusively use talloc, and should not use malloc at all.

4. Storing the active list

One issue you'll need to think through is where the variable for the head of the active list should be. In an object-oriented language, this would likely be a private static variable in a memory management class. Oops. You can't make the active list head a local variable in talloc, because tfree wouldn't be able to see it. We could make it a parameter to talloc and tfree, but then the programmer using talloc has to keep track of this, and could conceivably have multiple active lists, which sounds ugly. This is an occasion where global variable makes sense, and so you should use one. A global variable in C is declared outside of any functions. Typically, it is placed near the top of your file, underneath the include statements.

There's one bit of circular logic you've got to untangle. talloc needs to store a pointer (returned by malloc) onto a linked list. Your linked list code, in turn, uses talloc. Rather than trying to make this work in some complex mutually dependent structure, my recommendation is to break the circularity. In your talloc code, the single linked list that you use to store allocated pointers should be a linked list generated via malloc, instead of talloc. That means you'll need to duplicate some of your linked list code. Duplicated code is generally to be avoided, but avoiding this circular nightmare is worth it.

5. Some specifics

After you download the starter files, you should see the following:

schemeitem.h: this defines the SchemeItem structure again. It adds one additional type of item that you'll want to modify your code to deal with, a PTR_TYPE.
linkedlist.h: this is a modification form the previous assignment that removes the function cleanup, and also changes the documentation on reverse to indicate that data is not to be copied.
talloc.h: this defines the functions that you'll need to write from scratch for this assignment.
main.c: this is a tester function.
justfile: contains instructions for the command just, which will compile and test your code
test-e and test-m: usual
test_utilities.py: helper utilities used by test-e and test-m

The missing files here are linkedlist.c and talloc.c. You should have already copied over linkedlist.c from the previous assignment (then, you'll make the additional changes required by this assignment) and you should create talloc.c from scratch yourself.

To compile your code, issue the command just build at the command prompt. This will follow the instructions in the just file for building your project in order to produce an executable called linkedlist. At first, it won't build at all because your talloc.c and linkedlist.c files aren't there. To get started, copy in linkedlist.c (remove the cleanup function), and for now, within talloc.c just create every function that you need with no code inside it so that you can get everything to build. Once you have done that, you can begin implementing your functions, and testing appropriately.

The tester code creates an executable that you can run by typing ./linkedlist. The easiest way to run the tests is to use ./test-m and ./test-e, as usual, which will automatically compile all of your code and run the ./linkedlist executable for you.

Your code should have no memory errors when running on any input (correct or incorrect) using valgrind. The testing scripts will automatically run valgrind on your code, and show you if there are memory errors.

6. Optional Extensions

Work in this section is 100% optional, and not worth any points. Nonetheless, if you're looking for an extra challenge (or a good way to prepare for an exam), these are fun additional exercises to try.

Write a function that that can report a count, in bytes, of how much memory is being used in total by this talloc technique. It should include both all of the memory that the user asked for when calling talloc, but also all of the additional overhead in Value structs that talloc creates for assembling a linked list. Here is a signature for that function. You can add it to your talloc.h file:

int tallocMemoryCount();

7. Testing and submitting your work

7.1. Testing your work

Go back and look at the sections at the end of Scheme Intro 2 labeled "How to test your work". Everything there about how to run M tests and E tests applies identically here. Note that the tests will fail if your code has compiler warnings and/or valgrind errors, in addition to if its behavior is observably incorrect.

7.2. Submitting your work

You'll submit your work on Gradescope. You should already have a file CollaborationsAndSources.txt in your working directory (if not, copy it over from the previous assignment). Go to the header for this portion of the assignment and fill in the document. Indicate in what ways (if any) you collaborated with other people on this assignment (did you talk with anyone other than Anna or our prefect? Did you share strategies with anyone? Get or give advice on debugging? These are fine things to do, and you should note them in the CollaborationsAndSources.txt file). If you used any resources outside of your notes from class, that is also something to note in CollaborationsAndSources.txt. If you didn't talk with anyone or use any outside sources, please note that explicitly. Look back at the Scheme Intro 1 instructions if you want more details on the CollaborationsAndSources.txt file.

After updating CollaborationsAndSources.txt and finishing the assignment, run ./zipitup

This will zip up your C files, header files, and collaborations and sources. We will be re-copying in our version of the tests and of the included header files.

Then, upload the created zip to Gradescope.

On Gradescope, make sure to add your partner if you worked with a partner. This page has detailed instructions for how to do that.

On Gradescope, you should see autograder output for the same tests as you ran locally. There is a small but nonzero chance that you have coded something unexpected that cause the tests pass on the computer you’re working on, but not on Gradescope. We will be grading this assignment based on the results of the tests on Gradescope, so you should make sure to pay attention to how the tests run there as well when you submit.

For grading, you'll earn at least an M if the following conditions are all met:

All M tests pass
A visual inspection of your code shows that you have not hyper-tailored your code to pass the tests. Hyper-tailoring your code would be doing things like checking for the exact input from the test, meaning you haven't written your code in a way that would work for similar inputs (see Homework policies for more on this).
You have a CollaborationsAndSources.txt with the information described above.
The work is submitted by the late deadline.

You'll earn an E if:

All conditions above for an M are met.
All the E tests pass.
Your code is not significantly more complex than needed to accomplish the task.
You have a comment before each function describing what the function does (its input/output behavior, not a play-by-play of "first checks… then makes a recursive call …").
The work is submitted by the regular deadline or it is submitted by the late deadline and you elect to use a late token by submmitting a regrading request.

You may earn a grade of M or SP (Some Progress) and be offered to resubmit for an improved grade if:

All tests pass (all E tests if you can resubmit for an E, and all M tests if you can resubmit for an M)
Your code has significant stylistic or efficiency issues that we have not explicitly covered in class or on the feedback for a previous assignment.

If your code does not meet the criteria for an M or an E, then you'll earn a grade of NA (Not Assessable) or SP (Some Progress) for the assignment. Not Assessable means that your submission does not have an attempt at one or more functions in the assignment. Some Progress means that your submission has some code (in the correct language) for all functions in the assignment, but that the code does not pass all of the M tests, is hyper-tailored to the tests, and/or is missing a CollaborationsAndSources.txt file.

Good luck, ask questions, and have fun!

Assignment based off one by Dave Musicant. Thanks for sharing!