### Steps to self-host gitweb on openbsd Build a simple web interface for your git repos using mostly built-in functionality of OpenBSD and git. **Note:** I reconstructed these steps from memory after the fact. If any step fails for you as written, please email me so we can fix it. #### git and its user All repos will have public read-only git access. We'll also create a git user for write access over SSH. ```sh # software deps not in base system (pandoc to render readmes) pkg_add git pandoc useradd -m -s $(which git-shell) git cat <> /etc/ssh/sshd_config Match User git AllowAgentForwarding no AllowTcpForwarding no X11Forwarding no PermitTTY no EOF # for cleaner read-write git urls ln -s /var/www/git /home/git/repo ``` #### chroot gitweb deps The OpenBSD base system contains httpd and slowcgi which we'll use to run gitweb. By default CGI scripts have a chroot of `/var/www`, so any programs and shared libraries required by gitweb need to be copied there. ```sh # home directory for projects install -g daemon -o git -d /var/www/git # helper scripts from cozy forge ftp -o - https://dev.begriffs.com/repo-ui/snapshot/main.tar.gz | tar zxf - cd repo-ui-main-* # copy required binaries and shared libs to chroot BINS=" /usr/local/bin/git /usr/local/libexec/git/git-archive /usr/bin/perl " for path in $BINS; do ./imprison "$path" /var/www done # also copy system perl modules ./imprison-perl-modules /var/www # the cgi program cp /usr/local/share/gitweb/gitweb.cgi /var/www/cgi-bin # gitweb also needs a chrooted /dev/null mkdir /var/www/dev mknod /var/www/dev/null c 2 2 root:daemon chmod 0666 /var/www/dev/null # git hook our repos will use install -D post-update /var/www/bin/post-update ``` #### generate tls certs The OpenBSD base system contains acme-client(1) to retrieve https certs from Let's Encrypt. There is plenty of documentation online for this step. Try Roman Zolotarev's [guide](https://romanzolotarev.com/openbsd/acme-client.html). When finished with this step, you'll have public and private keys in a subdirectory of `/etc/ssl`, and an `/etc/httpd.conf` file for your domain. You'll also have a cron job configured to renew your certificate. #### configure httpd and gitweb **Note:** the provided configuration files need customization for your domain name and the directories you used for your certs. ```sh # default gitweb css and lightweight js cp -R /usr/local/share/gitweb/static /var/www # back up your httpd config cp /etc/httpd.conf /tmp/httpd.conf.bak # install configuration for gitweb and slowcgi cp httpd.conf /etc # edit httpd.conf to customize for your domain # (consulting your backup as needed) vi /etc/httpd.conf # gitweb configuration install -D gitweb.conf /var/www/conf/gitweb.conf # update the domain name in the conf vi /var/www/conf/gitweb.conf # add a message to your forge homepage cat < /var/www/conf/projects_list_head.html

Introductory content for your projects list

EOF # go time rcctl restart httpd ``` #### make a repo All projects are stored as bare repos under `/var/www/git`. This directory is within the slowcgi chroot, unlike someplace like `/home/git`. The category and description of each repo as displayed by gitweb are stored in plain text files within the bare repo (as part of git's database, not as versioned source code). Additionally, each repo needs a `post-update` hook to call `git update-server-info` whenever new code is pushed. We're using httpd to serve repos over "dumb https" rather than using a dedicated git protocol server. The `update-server-info` prepares files to make dumb http work. Use a helper script to do all this: ```sh # create the foo project, in category bar, with a description ./repo-new foo bar "a wonderful project" ``` The project will expose two git URLs: * read-only: https://example.com/git/foo * read-write: git@example.com:repo/foo **Mirrors** To mirror dependencies on your own git server, use the helper script: ```sh # for example, a hypothetical project foo on github ./repo-mirror foo https://github.com/user/foo.git ``` #### block large plagiarism models Hosting your own projects off GitHub from the start avoids their being automatically used for training. However any public website will still be scraped, including open source projects exposed by gitweb. The companies doing statistical plagiarism often do not respect robots.txt and must be blocked another way. One approach is publishing poisoned urls (ones forbidden by robots.txt), and implementing server-side rules to temporarily ban any IPs accessing those urls. Another approach is to make the client perform a somewhat costly computation to process web responses. The easiest way to deter bots is using CloudFlare's new [bot blocking reverse proxy](https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/). While putting a site behind CloudFlare does support the centralization of the internet, it's a quick way to get started while perhaps developing another approach.