The push script
This post is going to be shorter and more of a note. After being pointed to the fact that my feed.xml is botched up, including dev data, rather than production data (pointing to localhost, instead of the actual domain name), I felt the need to write a quick script to make it easier to push versions of the website.
As I was about to create a github repository to share the code, I realized: why not host the code on aethok itself? So much for this post being short…
Setting up a git server
At first, I thought that code hosting works by running a git server that can provide the code on request. This is only partially true. I do have the server – the computer serving aethok.com – but I don’t need to run a special program on it that will be responsible for serving the code, nor do I have to configure nginx in any way (at least not at first), to allow it to serve the code.
Instead, the way hosting a repository works, is by creating a special user on the server and piggy-back-ing on ssh for transferring files. In fact, I suspect the user doesn’t have to have a special name, but “git” is used as a convention, which is why you have to write it while cloning over SSH, as in “git@gitserver:/srv/git/project.git”.
Something that is not mentioned in the guide I linked earlier, is how to create /srv/git and who should own it. I created it with the root account and changed ownership to the git user, but beats me whether this is the right thing to do. (UPDATE: seems to be okay.)
Now, before pushing the repository, I’d like to rewrite the history, so that the author shows as my aethok.com user, rather than my github user. This article has the key, right at the end, providing a useful script for “git filter-branch”. This still left me with an alternative hidden history: the one from before. I deleted that, using “git update-refs -d”.
So far so good. Time to push the initial commit to the remote! Aaaand… it fails. “git push origin main” fails. More precisely, it hangs, without providing any further information. This is going to be fun. […] Well, actually, not too much fun. It turns out, SSH, which is called by git, does not DNS, so you have to duplicate your DNS record for the domain name in ~/.ssh/config, as described here. After that, the repo is pushed.
To be fair, the original guide does mention “If you’re running it internally, and you set up DNS for gitserver to point to that server…”, but I wouldn’t call this DNS :) As such, I didn’t understand that this was a requirement. Oh well, we all learn. A quick check that cloning the repo works … and it does! It’s just empty?! What gives?
I see, cloning gives a teeny warning at the end: “warning: remote HEAD refers to nonexistent ref, unable to checkout.”. Who would think this is a problem, right? It’s probably related to my rewriting of the history. And as always, Googling doesn’t take long to yield a solution: I need to set the “symbolic-ref” from the host.
But just before I do that, let’s poke at it a little bit. “git symbolic-ref HEAD” tells me it points to “refs/heads/master”. Interesting, since I remember that the default branch name was renamed in git itself to “main” some time ago. (In fact, it was a little tricky tracking it down, but the original announcement dates back to June 2020.) It’s a nice change. I like it. But why is my default branch on the server still called “master”? “git –version” gives me 2.20. In my client, it’s 2.35. Updating and upgrading and updating again changes nothing. I can’t be bothered to look how old 2.20 is at the moment, but I’m guessing that’s the root of the problem.
Oh well. I know how to fix it now, so I follow the solution I mentioned earlier. Cloning the repo now gives no warning, and you can see the commits there. Nice!
Setting up an unauthenticated read-only option
I thought I had to allow unauthenticated reads via HTTP, but that’s not true either. One can do it over the git protocol too. This time, there’s a daemon involved, as explained by the next article in the git documentation. At this point, I feel it’s a good point to do some background reading and see what the git book has to say about the available options.
And I’m happy I did this. It turns out, “[the git protocol is] also probably the most difficult protocol to set up”. So following the example of GitHub, I think I’ll use HTTP for read-only access after all.
The relevant chapter from the git documentation mentions using Apache, but I’m using nginx for aethok.com. I could try to be smart and figure it out, but why, when the internet is buzzing with information! Buzz buzz. And sure enough, the first article I find is already looking promising.
For the time being, I won’t install “apache2-utils”, because I don’t want to allow any authentication over HTTP; I want to use it just for reading. Let’s see if I need it later.
Hmm, the article suggests I change the owner to /srv/git to www-data, but I’m
afraid this will disable my SSH access. I’m going with just changing the group,
as described in the chapter I mentioned earlier from the book, hoping that it
will be enough. (After a little confusion, it is indeed possible that a file is
owned by both a group and a user: “stat
After my initial excitement, the article starts to seem a little suspicious. The author suggest one uses “sudo cd …”, which make me doubt their credibility. Still, I push forward to see if it gets me where I want to get… And I give up on it. It still involves too much thinking on my side. So I move onto the next example which is just a gist on GitHub.
Well, this still doesn’t work, so perhaps I should try the git protocol after all.
Back to the chapter about setting up the git protocol from the git book. Setting up seems quite easy, but I cannot clone… After banging my head for 15 minutes, and while it’s being obvious the problem is the same as before, I resist using the IP address instead of the domain name. And when I finally try, it works. Sadness. So I need to triplicate the DNS settings to my /etc/hosts now. There must be a better way to do this, but I’m tired at the moment, so since it works I’ll leave it at that.
Git repository: online! You can get the code at “git clone git://aethok.com/aethok”. My conclusion is that the git protocol is not the hardest to setup, contrary to the claim in the git documentation.
The push.sh script
So yeah, a few hours later, I can point you to my page-uploading script. Just “git clone git://aethok.com/aethok” and take a look at “push.sh”. The features it includes are: 1) protecting from a dev push; 2) checking checksums to avoid duplicate pushes; 3) creating a backup of the site before pushing it; and 4) doing the actual push, of course.
I feel the way I checksum the files is quite neat. I basically sort them by name and then concatenate them all to one giant file and call “md5sum” on it. Perhaps, there’s a simpler way, but this is the best I’ve got at the moment. As usual, the important thing is that it works. It is also not the least efficient idea I had, since the comparison is done just on the checksums, rather than unzipping a backup and comparing the contents.
As I said, I’m pleased with it.