### Steps to self-host gitweb on openbsd

Build a simple web interface for your git repos using mostly built-in
functionality of OpenBSD and git.

**Note:** I reconstructed these steps from memory after the fact. If any step
fails for you as written, please email me so we can fix it.

#### git and its user

All repos will have public read-only git access. We'll also create a git user
for write access over SSH.

```sh
# software deps not in base system (pandoc to render readmes)
pkg_add git pandoc

useradd -m -s $(which git-shell) git

cat <<EOF >> /etc/ssh/sshd_config
Match User git
	AllowAgentForwarding no
	AllowTcpForwarding no
	X11Forwarding no
	PermitTTY no
EOF

# for cleaner read-write git urls
ln -s /var/www/git /home/git/repo
```

#### chroot gitweb deps

The OpenBSD base system contains httpd and slowcgi which we'll use to run
gitweb. By default CGI scripts have a chroot of `/var/www`, so any programs
and shared libraries required by gitweb need to be copied there.

```sh
# home directory for projects
install -g daemon -o git -d /var/www/git

# helper scripts from cozy forge
ftp -o - https://dev.begriffs.com/repo-ui/snapshot/main.tar.gz | tar zxf -
cd repo-ui-main-*

# copy required binaries and shared libs to chroot
BINS="
/usr/local/bin/git
/usr/local/libexec/git/git-archive
/usr/bin/perl
"

for path in $BINS; do
    ./imprison "$path" /var/www
done

# also copy system perl modules
./imprison-perl-modules /var/www

# the cgi program
cp /usr/local/share/gitweb/gitweb.cgi /var/www/cgi-bin

# gitweb also needs a chrooted /dev/null
mkdir /var/www/dev
mknod /var/www/dev/null c 2 2 root:daemon
chmod 0666 /var/www/dev/null

# git hook our repos will use
install -D post-update /var/www/bin/post-update
```

#### generate tls certs

The OpenBSD base system contains acme-client(1) to retrieve https certs from
Let's Encrypt. There is plenty of documentation online for this step. Try Roman
Zolotarev's [guide](https://romanzolotarev.com/openbsd/acme-client.html).

When finished with this step, you'll have public and private keys in a
subdirectory of `/etc/ssl`, and an `/etc/httpd.conf` file for your domain.
You'll also have a cron job configured to renew your certificate.

#### configure httpd and gitweb

**Note:** the provided configuration files need customization for your domain
name and the directories you used for your certs.

```sh
# default gitweb css and lightweight js
cp -R /usr/local/share/gitweb/static /var/www

# back up your httpd config
cp /etc/httpd.conf /tmp/httpd.conf.bak

# install configuration for gitweb and slowcgi
cp httpd.conf /etc

# edit httpd.conf to customize for your domain
# (consulting your backup as needed)
vi /etc/httpd.conf

# gitweb configuration
install -D gitweb.conf /var/www/conf/gitweb.conf

# update the domain name in the conf
vi /var/www/conf/gitweb.conf

# add a message to your forge homepage
cat <<EOF > /var/www/conf/projects_list_head.html
<p>Introductory content for your projects list</p>
EOF

# go time
rcctl restart httpd
```

#### make a repo

All projects are stored as bare repos under `/var/www/git`. This directory is
within the slowcgi chroot, unlike someplace like `/home/git`.

The category and description of each repo as displayed by gitweb are stored in
plain text files within the bare repo (as part of git's database, not as
versioned source code). Additionally, each repo needs a `post-update` hook to
call `git update-server-info` whenever new code is pushed. We're using httpd to
serve repos over "dumb https" rather than using a dedicated git protocol server.
The `update-server-info` prepares files to make dumb http work.

Use a helper script to do all this:

```sh
# create the foo project, in category bar, with a description
./repo-new foo bar "a wonderful project"
```

The project will expose two git URLs:

* read-only: https://example.com/git/foo
* read-write: git@example.com:repo/foo

**Mirrors**

To mirror dependencies on your own git server, use the helper script:

```sh
# for example, a hypothetical project foo on github
./repo-mirror foo https://github.com/user/foo.git
```

#### block large plagiarism models

Hosting your own projects off GitHub from the start avoids their being
automatically used for training. However any public website will still be
scraped, including open source projects exposed by gitweb. The companies doing
statistical plagiarism often do not respect robots.txt and must be blocked
another way.

One approach is publishing poisoned urls (ones forbidden by robots.txt), and
implementing server-side rules to temporarily ban any IPs accessing those urls.
Another approach is to make the client perform a somewhat costly computation to
process web responses.

The easiest way to deter bots is using CloudFlare's new [bot blocking
reverse
proxy](https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/).
While putting a site behind CloudFlare does support the centralization of the
internet, it's a quick way to get started while perhaps developing another
approach.