These are my notes for building a new Proxmox 8 server.

Proxmox 8 Server Build (PVE2)

I’ve got new hardware and want to bring this server up at the same as as the original so need new IP addresses (which will become static)

Use balena etcher to create a USB bootable disk

Boot disks RAID1 - Mirror

alt text

RAID1 (Mirror) on m.2 drives

https://www.youtube.com/watch?v=lhRPhBDysVI

Going to separate out boot drive from data. So that if data drives fail I can boot.

# can see any errors
# zfs pool is called: rpool
zpool status

Networking

  • Hostname FQDN - pve2.dave.home
  • IP - 192.168.1.88 (was assigned by router DHCP, then made static). It may be easier to assign manually an IP next time outside the DHCP range.
  • Gateway - 192.168.1.254 (the router)
  • DNS - 192.168.1.254 (the router)

https://192.168.1.88:8006 - PM admin interface

Updates

Disable Enterprise repos

Add no subscription repos

alt text

PVE 8.2.4 running and updated on new hardware.

Data Disks - RAIDZ-1 - ZFS Datapool

I’ve got 4 data disks which I want to be running for resilience.

RAIDZ-1 (tolerates 1 drive loss) which needs at least 3 disks

I don’t need anything special (just want redundancy but not overly) so lets use RAIDZ-1

alt text

pve2, disks, zfs, create ZFS

  • datapool - (tank is common name)
  • compression on (saves on writes, and cpu is cheap)
  • ashift - 12 (default)
zpool status
zfs list

Datadirectory on ZFS Datapool

alt text

DataDirectory

Now I can upload an ISO

Ubuntu 24.04.1 LTS

Ubuntu 22.04.5 - my current standard build.

Create a VM

https://www.youtube.com/watch?v=sZcOlW-DwrU craft computing pm 8 - very good explanations as to what the different settings are.

alt text

BIOS

alt text

disks - give 80GB - on ZFS created with thin provisioning, so it only uses whats needed

alt text

max performance of the VM. Min of 4 threads/cores. I have 28 cores in 1 socket.

alt text

memory max - give 4GB at least only uses what needed (ballooning)

alt text

and

alt text

Start up the VM. It got an IP from the DHCP server of 192.168.1.251, and as I installed SSH I can just use my terminal to connect

# so pm can see the IP address
sudo apt install qemu-guest-agent

Firewall and DHCP Networking

I want to run multiple websites inside PM. I used to run pfSense for firewalls and internal DHCP.

Is it possible to use just the proxmox firewall and router DHCP?

Update - have decided to keep running pfSense because my DHCP server (router) is behaving strangely, and pfsense has worked for many years.

video - for running internal pm firewall this is good.

pfSense 2.7.2

I’m going to reinstall from scratch pfSense to understand it more

https://www.dlford.io/pfsense-nat-how-to-home-lab-part-3/ - great tutorial

Download pfSense - 2.7.2. Need to checkout (free) and unzip download into iso then upload to proxmox2. It was a BETA version but that was all I could get.

networking

proxmox2, system, network, create, linux bridge

  • vmbr0 is there already. connected to en01 (wired ethernet). 192.168.1.88/24
  • vmbr1 - comment: internal server network

reboot server

Create pfsense vm

create vm

Follow tutorial. TODO - see what seetting worked and tweak

Disk use datadirectory

WAN: vtnet0 LAN: vtnet1

Install CE

Disk - ZFS and GPT. Stripe. No redudancy.

Reboot

WAN: vtnet0 LAN: vtnet1 (don’t do this)

192.168.1.179 is allocated from my DHCP server to WAN side ie vtnet0

As pfSense uses 192.168.1.x by default on the LAN we can’t use this to start with.

Option 3? to reassign network ie disable LAN interface

Option 8 to go into shell

# turn off firewall (so can access admin from WAN)
# http://192.168.1.179
pfctl -d 

# u: admin
# p: pfsense
  • hostname: pfsense2
  • domain: dave.home
  • dns: 8.8.8.8, 1.1.1.1 (I’m was using 192.168.1.254 on pm host - overridden now)

Static IP - forcing just to use the DHCP allocated 192.168.1.179

Can’t Conntect

  • 1) Assign interface - disable LAN
  • 8) Shell - pfctld -d
  • try another browser (Chrome does bad caching - Firefox is useful especially as we’re on http by default)

Add firewall rule to WAN to allow admin interface

Firewall > Aliases > Ports > Add

pfSense_Admin_Ports

80, 443, 22, 8080

alt text

Add a rule.

Couldn’t get the rule to reorder to go above teh

Interface, WAN

Uncheck block bogon networks

8080 for Admin Interface

So that 80 and 443 can be freed up for regular web traffic.

System, Advanced, Admin Access, TCP Port 8080, HTTPS

Add LAN Interface

Interfaces, Interface Assignments

vtnet1 (network port)

Interfaces, LAN, Enable

Static IPv4

172.16.44.1/24

System, Advanced, Networking, Kea DHCP (As ISC DHCP is Deprecated)

Services, DHCP Server, LAN

Range: 172.16.44.10 - 172.16.44.100 (I had to add the range then enable)

Enable

alt text

Make sure the proxmox side has the LAN range, so it can get access to the internet.

Template Server VM

Create a normal VM on the vmbr1 internal networking

I’ve got an IP address on 172.16.44.10

but can’t see the internet - was a problem with proxmox LAN range not enabled.

n: blue
u: dave
p: letmein

sudo apt update
sudo apt upgrade
sudo apt install -y qemu-guest-agent

sudo visudo

# add this under sudo ALL=(ALL:ALL_ ALL)
dave ALL=(ALL) NOPASSWD:ALL

alt text

Tick guest-trim

Shutdown VM, right click, Convert to Template

Create a new VM based off Template

Right click on template and create new VM

setup port forward so can SSH to it

Get MAC of new VM from UI

eg: BC:24:11:64:57:A8

Firewall, NAT, Port Forward

alt text

ssh pfsense2 -p 103


# run boilerplate
# https://www.dlford.io/secure-ssh-access-how-to-home-lab-part-5/#boilerplate
sudo bash -c "bash <(wget -qO- https://raw.githubusercontent.com/dlford/ubuntu-vm-boilerplate/master/run.sh)"

DHCP Reserve (so IP doesn’t change)

Services, DHCP Server, LAN, Edit Static Mapping (at bottom)

MAC and IP eg 172.16.44.103

Change the Port Forward

Firewall, NAT, Port Forward

Port forward for 80 and 443 from WAN (websites)

sudo apt install nginx

Lets open up 172.16.44.103 ports 80 and 443

alt text

Firewall, NAT, Port Forward (this adds associated filter rule)

Okay this did work by testing my hitting my public IP from http.

Mounts and Shares

# unmount from old PM
umount /dev/sdf
# or
umount /mnt/wd-2tb-usb
umount /mnt/windows-share

# mount back on old PM
mount /dev/sdd /mnt/black-256-stick

mount /dev/sdf /mnt/wd-2tb-usb


# new PM
# list usb devices - Western Digital Elements 2621 is my disk
lsusb

# look for sdb, sdc etc.. with correct size of disk - mine was sdc
# sde is 250GB stick
# sdf is 1.8TB drive
# this shows the mount if it is there
lsblk

# shows drives and mounted on
df -h

mkdir /mnt/black-256-stick
mkdir /mnt/wd-2tb-usb

mount /dev/sde /mnt/black-256-stick
mount /dev/sdf /mnt/wd-2tb-usb

8th Oct - having some strange issues with slowing down on backups (taking an hour) on my black-256-stick. The 2tb drive is okay.

alt text

Adding the mounted disk into PM

Survive a reboot

I’m not sure if windows share does survive a reboot. I had to do mount -h to get it to wake up again.

# find device eg /dev/sdf - this is my 2tb wd drive
lsblk

# find uuid of device rather than /dev/sdd (as this may change)
# eg ab6e004f-6c99-4d33-b145-4208201c404e
blkid

vi /etc/fstab

# UUID=<UUID> <mount_point> <filesystem_type> <options> <dump> <pass>
# dump 0.. used for backup utility
# pass 2... root filesystems use 1 

# wd-2tb-usb
UUID=ab6e004f-6c99-4d33-b145-4208201c404e /mnt/wd-2tb-usb ext4 defaults,nofail 0 2

# to make sure system picks up new version of fstab
systemctl daemon-reload

# test the changes without rebooting
mount -a

# see if mount point is now active
df -h

Spin down USB External disk when not in use

Lets save energy (and less noise) by spinning down the drive when I don’t need it.

# check which drive in the usb to spin down.. /dev/sde 
df -h

# put drive into standby mode
hdparm -y /dev/sde

# I go this error but it did go into standby mode
#  issuing standby command
# SG_IO: bad/missing sense data, sb[]:  f0 00 01 00 50 40 00 0a 00 00 00 00 00 1d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

It does seem to spin up again when I ask it to show backups

Lets see if on spin down state, it wakes up and does a backup at 0100 - yes it does.

# crontab -e
# add this line to crontab
0 3,15 * * * hdparm -y /dev/sde

New VM DHCP and NAT

It gave it a DCHP IP, so lets make it static.

1) Static IP - Services, DHCP Server, LAN, Edit Static Mapping (at bottom)

96:30:EC:A6:0C:93

172.16.44.104

2) Open so can SSH - Firewall, NAT, Port Forward

3) Open up port 80 to this new reverse proxy.

Change name of restored machine

# https://www.dlford.io/secure-ssh-access-how-to-home-lab-part-5/#boilerplate
sudo bash -c "bash <(wget -qO- https://raw.githubusercontent.com/dlford/ubuntu-vm-boilerplate/master/run.sh)"

# on dev - shortcut 
sudo vim ~/.bash_aliases
alias 4='ssh pfsense2 -p 114'

Open share on Windows 10

right click on folder, properties, sharing, Advanced Sharing, Share this folder

create local user on win: bob password …. not an email user.. just local.

Windows features

Windows firewall - Allow an app through firewall. File Sharing both.

mkdir /mnt/windows-share

# I am inside proxmox in a 172.16 address range.. need a way to get out
mount -t cifs -o username=bob,password='xxxxxxx' //192.168.1.152/proxmox /mnt/windows-share

# automate mounting
vi /etc/fstab
//192.168.1.152/proxmox /mnt/windows-share cifs username=bob,password='secret',iocharset=utf8 0 0


# nofail - allow boot if mount not available
# automaout - systemd will try to mount the share on access rather than boot
//192.168.1.152/proxmox /mnt/windows-share cifs username=bob,password='secret',iocharset=utf8,nofail,x-systemd.automount 0 0


# **using this**
# notice no quotes on password
//192.168.1.152/proxmox /mnt/windows-share cifs username=bob,password=secret,iocharset=utf8,vers=3.0,nofail 0 0

# tests fstab
mount -a 

# got some weird reboot issues...
# and could only see share when I manually when to it inside bash.. then proxmox could see it.


ZFS

  • nvme0n1 - 465G
  • nvme1n1 - 465G

  • sda 1tb
  • sdb 1tb
  • sdc 1tb
  • sdd 1td

  • sde - 233G
  • sdf 1.8TB
zpool status
  • datapool - data
  • rpool - mirror
# my rpool is full up (mirror)
df -k

du -h --max-depth=1

umount /dev/sde

# show mounts
mount

alt text

lsblk with black-256-stick and wd-2tb-usb mounted

SSD Wearout

2% wearout since 1st Sept (3 months).. went from 1 to 2% on 3rd Dec

https://www.reddit.com/r/Proxmox/comments/ss07l4/guide_to_minimizing_ssdnvme_wearout_with_proxmox/

trim

/tmp/zfs_daily_report.sh

#!/bin/bash

# Variables
POOL_NAME="datapool"
LOG_FILE="/var/log/zfs_daily_iostat_$(date +%Y-%m-%d).log"

# Capture zpool I/O stats
zpool iostat -v $POOL_NAME > $LOG_FILE
echo "Daily ZFS I/O stats logged to $LOG_FILE"

cron.d

0 0 * * * /tmp/zfs_daily_report.sh

Lets monitor and see what read/writes are and get a baseline

other tips

https://www.reddit.com/r/Proxmox/comments/i3mupd/proxmox_standalone_or_cluster/

Disk space running out

No idea why I’m using 31GB of storage (out of 44GB assigned) - general ubuntu. With 12GB available. ie sitting at 74% used. My app fails when it is over 90% used (browsertrix webcrawler docker image)

Okay 70% now..

docker system prune

# 6.5GB in home dave23-6.5

# 4GB in the .cache
# pipenv - 1.4GB
du -h /home/dave/ | sort -rh | head -n 20

# where is the 16.5GB gone?
# lots in var
9.4G    /var
6.0G    /var/lib
5.1G    /var/lib/docker/overlay2
5.1G    /var/lib/docker
3.3G    /var/log/journal
3.3G    /var/log
2.4G    /var/log/journal/dd03133706587e6d2ee267a56561951d
1.5G    /var/lib/docker/overlay2/4e642950a0ef1adac39237a3b4adf1758a1c360961dce9cb050eddc0b9af8d3c/diff
1.5G    /var/lib/docker/overlay2/4e642950a0ef1adac39237a3b4adf1758a1c360961dce9cb050eddc0b9af8d3c
1.4G    /var/lib/docker/overlay2/4e642950a0ef1adac39237a3b4adf1758a1c360961dce9cb050eddc0b9af8d3c/diff/usr
1.3G    /var/lib/docker/overlay2/2bc258eac131501c668ce3d972d374437086dd5206259949dbb1332df119fd22/diff
1.3G    /var/lib/docker/overlay2/2bc258eac131501c668ce3d972d374437086dd5206259949dbb1332df119fd22
1.2G    /var/lib/docker/overlay2/2bc258eac131501c668ce3d972d374437086dd5206259949dbb1332df119fd22/diff/usr
833M    /var/log/journal/7e59cee94454465bcc02272f66f6c80e
762M    /var/lib/docker/overlay2/4e642950a0ef1adac39237a3b4adf1758a1c360961dce9cb050eddc0b9af8d3c/diff/usr/lib
748M    /var/lib/docker/overlay2/2bc258eac131501c668ce3d972d374437086dd5206259949dbb1332df119fd22/diff/usr/lib
618M    /var/lib/snapd
569M    /var/lib/docker/overlay2/9002c508d1f0079ecd77e58ce1da246ddf97bbcb36a424a85a9054260344dc54/diff
569M    /var/lib/docker/overlay2/9002c508d1f0079ecd77e58ce1da246ddf97bbcb36a424a85a9054260344dc54
402M    /var/lib/docker/overlay2/4e642950a0ef1adac39237a3b4adf1758a1c360961dce9cb050eddc0b9af8d3c/diff/usr/share

VM Websites / APIs

http://mateer.hopto.org/ is the name I got from https://www.noip.com/ which my router talks to to provide dynamic DNS.

I’ve got an Nginx reverse proxy in front of all websites which forwards traffic to a specific internal IP via http. It also handles certificates via certbot

Maybe move over

Test sites

Sites hosted elsewhere

VM Workers

  • General archiver eg auto-archiver.com/ - Python which runs fine in 4GB RAM
  • Tweet service that polls an Azure database every minute to see if any Tweets to send out. Python. 1GB RAM
  • YouTube screenshotter archiver - I’m using playwright.dev via a Proxy brightdata.com/ and using headful Firefox

Networking

Plusnet Two Router / DHCP is on 192.168.1.254

Proxmox is on a static IP allocation of 192.168.1.139 - port 8006 gives us the Proxmox web UI

pfSense VM has a DHCP (!fix this) alloction of 192.168.1.144 - port 8080 gives us the pfSense web UI

All port 80 and 443 traffic is routed to pfSense from the router.

Power Costs

Big server may draw 300W at idle. eg dual xeon

My UK provider good energy charges me 24.70 pence per kWh on 3rd Sept 2024. And 59.18 pence per day

So the xeon server would cost 0.3 kW x 24 hours = 7.2 kWh/day = £1.78 / day £53 / month

My 13th Gen Intel draws about 40W at idle running wifi, disks, etc..

RGB Machine

40W at rest it draws = 0.04 kW * 24 hours = 0.96 kWh/day = £0.237 per day = £7 per month

Storage

Drive Writes Per Day (DWPD) - how many times the entire capacity can be written per day over warrenty period. eg 1TB SSD DWPD rating of 1 - can write 1TB every day for 3 to 5 years.

SATA DOM (Disk on Module) - often used for boot drives. SATA SSD - 2.5” nvme Mechanical - for boot not recommended.. as SSD is cheaper, lower power and more reliable.

For boot drives approx 10 TB/year is a good number.. so shouldn’t wear out the SSD at all.

So after my drive failure(s) of consumer ssd’s, I want to make my server more resilient.

  • m.2 2 * 1TB just for extra relaibility (check DWPD). He uses 2 x 256 AirDisk SSD
  • SSD

Mirror the m.2 drives for the OS

## Logs

Look for problems eg

Sep 02 06:54:11 pve smartd[718]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 68 to 69
Sep 02 06:54:12 pve smartd[718]: Device: /dev/sdb [SAT], 8 Currently unreadable (pending) sectors
Sep 02 06:54:12 pve smartd[718]: Device: /dev/sdb [SAT], 8 Offline uncorrectable sectors

My sdb is a mechanical disk I’m not using.

## Disks

alt text

Have got 26% wearout on this Samsung SSD 840 Pro. This is my backup server which I’m rebuilding soon.

This is nothing to worry about and drive shoudl still have plenty of life left.

Servers Setup

alt text

BIOS settings for my silver server (which had SSD and potential hardware failture). Notice auto power on.

Router Setup

alt text

Router thinks that .139 is not connected (but it is as the PM Admin interface answers). 1 port having 2 IP addresses is confusing the GUI for the router.

There are 2 .139 addresses as the MAC address changed when I got the backup PM server.

(Static) means the local IP address is assigned by the device. It was initially assigned DHCP when Proxmox was installing, now it is fixed. Interestinly I can’t change the router to ‘always use this IP address’ as it isn’t assigned by DHCP anymore. I hope DHCP wont assign .88 anymore.

I did deleted some entries which were not used anymore, and the router did some strange things like being extremely slow (and management interface loading slowly) and needed a boot.

alt text

All 80 and 443 traffic is forwarded to pfsense on 192.168.1.144 (whc)

This router can assign 64 IP addresses on DHCP

Lessons Learned

  • SSD’s fail.
  • Hardware fails.
  • Standard routers are fickle (DHCP leases sometimes get confused / UI gets confused)
  • Electricity cuts do happen (PM did recover fine)
  • I want a backup PM server ready to go (and a testing server as well)
  • You always need more disk space than you think on a VM
  • Home connection with dynamic DNS has been great. Was 48Mbps down and 12MMbps up. Now on fibre at 500/110.
  • Backups are super important
  • Template VM’s useful for spinning up test VM’s quickly
  • pfSense is hard
  • System has been very reliable
  • Been invaluable for archiving as many services now (eg YouTube) will block cloud based IP’s