Infrastructure as Code - which tool?

TL;DR - I think you’re probably best off with Terraform

Preface

These are highly opinionated pieces. While they are based on many years of professional experience in the industry they don’t take into account individual nuance. Posts are mostly going to focus around AWS tooling as that’s where most of my experience is. What better way to kick off than with CloudFormation.

CloudFormation

You might hear some very opinionated (wait, this blog post is opinionated…) people claim that CloudFormation sucks. And I kind of agree. However if used in a very specific way it can be pretty useful.

CloudFormation originally only supported JSON templates. I believe the intention AWS had was that CloudFormation would only ever be written by tools (hence the later release of CDK), however that kind of sucked and people only ever hand crafted templates, leading to CloudFormation eventually supporting YAML.

This concept of machines only ever managing CloudFormation is important however. To understand why we have to understand how CloudFormation works. When a template is deployed CloudFormation spins up the resources, and saves the state and configuration of each resource. This allows CloudFormation to perform its infamous rollback process. It also allows CloudFormation to know what steps to take to perform updates. Now if users have been meddling with resources this all falls apart. Subtle changes in resources can break CloudFormation changes and even worse CloudFormation can rollback a failed release to what was in it’s own state rather than what was running prior to a stack update!

The ability to import resources into CloudFormation is new(ish) and support is very limited. If you have an existing stack - good luck.

This all sounds horrible, but if you think about what CloudFormation is trying to achieve you can make it work in your favour.

My CloudFormation rules:

  • New AWS account
  • Disable access to EVERYTHING except managing CloudFormation
  • Engineers should never make manual changes (and shouldn’t be able to if you follow point 2)
  • CloudFormation deployed out by pipelines / CICD
  • If you need a resource not supported by CloudFormation - make it supported using custom resources or wait
  • Don’t be afraid to make large templates that cover an entire environment. If you hit the CloudFormation size limit you probably want another AWS account anyway.
  • CloudFormation needs to be tested in a staging account first

This sounds like a lot of work - whats the pay off?

  • AWS supported and managed tooling
  • Accurate change sets (these can be very very very important in controlled environments)
  • Very simple and predictable rollback - this means in theory (provided the rollback doesn’t fail. Often causes of failed roll backs is manual resource changes) you will either move to the new state, or return to the old state

Terraform

I haven’t used OpenTofu in anger yet. It’s likely that everything I say here probably applies to OpenTofu as well and it’s worth while checking out - I just don’t have experience with it yet.

I’m going write a whole post on using Terraform so I will keep this a little bit more brief. Terraform has a lot of pitfalls and issues (can we please not store secrets in the statefile and have dynamic state backend config kthnkz) but the advantage is that it much easier for engineers to manage in an ad-hoc manner.

Unlike our CloudFormation scenario above we have:

  • very good resource import support
  • doesn’t use saved state to plan changes (I’m over simplifying here… I know)
  • will try to correct manual changes
  • it provides a much richer language for defining resources which means that engineers don’t need to use abstraction languages to generate it

It does come with its own drawbacks though:

  • doesn’t support rolling back changes. If terraform breaks during a deployment you have to fix it yourself. This isn’t as bad as it sounds, but you will need to factor it into your lifecycle / deployment plans.
  • Using terraform securely is hard. I don’t think any company actually has secure terraform deployments.

If you have an environment with existing resources or engineers that are likely poke things manually - terraform is a good choice.

HCL language, while not perfect can lead to good grepability. You can still fall down a rabbit hole of module mess (this will be covered in my other post) but if you try to keep your terraform flat then managing bulk changes can be ok.

Pulumi, CDK, CDK for terraform, other code generators

Unless you have a very good reason, just don’t. They seem like a good idea. Your software engineers will think its a good idea. However from experience every single implementation I’ve seen or worked with has turned into a mess. I think they can work however they have too many footguns to remain manageable.

Software engineers will often abstract resources to the point that making changes is fragile, complex and time consuming. Finding out how a resource is created and configured requires following chains of abstractions (hope your IDE is configured correctly).

After all of this you still end up with a template. Which means you still need to understand how to debug the template + you have to map the generated template back to the code. So not only do you need to know the tool / code abstractions you still end up with all the pain of managing the output of templates anyway.

The other problem is grepability. When operating large infrastructure deployments there’s some tasks when abstractions like this become frustrating. Some examples:

  • Update all S3 buckets with a new security policy
  • Change the TLS policy on all load balancers
  • Find how/where a specific KMS key is configured

When using code generators it can be sometimes hard to perform these tasks. Sometimes easier. But often harder as everyone will have created their own abstraction for their resources. Compared to a DSL (like Terraform’s HCL or the YAML for CloudFormation) we can search all the entire company’s repos for something like aws_s3_bucket and possibly even script changes to the resource. Using a tool like multi-gitter suddenly means that its possible to update resources to a new company policy or standard quickly.


Just a bunch of scanners (JBOS?)

This is the story on how I spent far too much money and time getting a scanner to work over iSCSI so that I could prove “Chris O” wrong on StackExchange. The TL;DR is that yes scanners work fine over iSCSI.

It begins with waking up to the thought “could I connect a scanner to SCSI disk array chassis” - probably after watching too much Adrian’s Digital Basement. I think this this is very much possible, but the thought morphed into - what about iSCSI arrays. While I joked about it I put off further investigation when I saw the price for iSCSI bays. They are still stupid expensive.

The idea still stuck with me though and I wanted to do some more research into if anyone had. I didn’t find much. All I really found was Chris saying it couldn’t be done because:

iSCSI isn’t a complete encapsulation of the SCSI protocol over TCP/IP. … Source: 15 years of managing iSCSI storage networks.

This bugged me a lot because it didn’t point to any sources. I decided to read through iSCSI RFCs and couldn’t find anything that would suggest that it wouldn’t work. I suspect Chris hasn’t had as much experience managing iSCSI scanner networks. Being theoretically correct is one thing though - I wanted to be practically correct.

The Setup

A minimal test setup requires three things:

  • iSCSI Target (this is a server for normal people)
  • iSCSI Initiator (this is the client for normal people)
  • A SCSI scanner

Turns out that even though iSCSI arrays are still expensive tgt (iSCSI Target software) on Linux supports passthrough devices.

So our setup ends up being:

  • Linux tgt iSCSI Target
  • Virtual machine on a different computer iSCSI initiator
  • A SCSI scanner

With the discovery of tgt I thought this project would be simple, however a reoccurring theme with this project is that the simple things end up being the hardest. The first problem is that inexpensive scanners only existed in my head. Turns out most scanners sold 2nd hand today either don’t have SCSI or are pricey. Eventually I came across a Canoscan FS4000US film scanner for a reasonable price.

Canon FS4000US film scanner sitting on my desk. It’s long upright unit with a flap to insert a slide full of film

The reason it was a reasonable price is that the slide/film mounts were missing. This most certainly wouldn’t be a problem as I don’t need to actually scan film (she thought).

Drivers

There’s a bit of a meme about printers being the worst computer accessory to setup, configure and maintain. I don’t believe this is true. Scanners a far far far worse - this is why you don’t see scanners today - we lost. The FS4000US was released to market sometime around 2001 and 2002 from my research. It received 32bit drivers up to Vista.

My first task was to test the scanner on USB to ensure it worked prior to iSCSI fun. Luckily VueScan still exists and provides a driver replacement for the FS4000US. It was able to detect the scanner. Great win!

A slippery slide

VueScan however would not let me scan as it couldn’t detect the slide/carriage for the media being inserted. When I purchased the scanner I knew the slide was missing. I was hoping that it would just scan without checking, and if it did check it would just need a limit switch locked down. This was not the case - the FS4000US has a loading mechanism (it’s actually fairly complex) and uses a light sensor to perform alignment.

The door of the scanner open with a post-it note where the slide goes in

After many attempts to “trick” the sensor I started designing a 3d printed slide replacement however before kicking off a print I decided I’d try “one more time”. I managed to trick the sensor into thinking film was loaded and VueScan happily scanned an image!

Screenshot of VueScan scanning in progress

Other parts SCSI or SSCCSSII

It’s time to talk about SCSI. The FS4000US is “SCSI-2”, which at the time was the latest specification for the SCSI standard. Confusingly SCSI-2 refers to two different specifications - Wide and Narrow. Wide came about in 2003, while Narrow came out in 1994. As such the FS4000US is Narrow SCSI. The best way to think of narrow and wide is that they double the bus width - more pins. Because of the additional bus width the number of devices on the SCSI bus changed from 8 to 16.

There’s some other considerations with SCSI as well - SE/HVD (Single ended / high voltage differential) or LVD (low voltage differential). Along with clock speed. The FS4000US is narrow, 10MHz, SE/HVD. Modern SCSI is wide, 160MHz, LVD.

The good news is that most SCSI devices are backwards compatible (usually sacrificing speed on the bus). The only important consideration is usually LVD vs SE/HVD. I hunted down a PCIe SCSI card and ended up with a Dell LSI20320IE. This has an internal SE/HVD connector.

Oh connectors. Connectors. I mentioned the difference between wide and narrow, but here’s where it gets fun. Our FS4000US has a HD50 connector (50 pins) - this is usually a little bit unique but not unheard of. The LSI20320IE has a HD68 connector (68 pins). So how do we connect a narrow device to a wide controller? We buy a dubious cable.

I ended up with this HD68 to HD50 cable. I’m pretty certain this is designed for when your connecting a newer hard drive to an older controller. Why would this matter though?

HD68 to HD50 cable

Well as SCSI is a bus you end up with long cables with many devices sharing the cabling. As there’s no dedicated device at the start or end of the bus the signal would reflect off the end of the cable. This is why we have SCSI terminators. Often these are built into devices and as long as you set the terminator on the first and last device everything is fine. Other times you might use dedicated terminators plugged into the cable. These terminators prevent signal reflections and make the bus stable.

Bringing this back to the cable, we have 68 pins on our wide side and 50 pins on the narrow. This means there’s a bunch of pins that are left unterminated. This originally concerned me, but on my very simple SCSI configuration I realised that this would likely be fine. This is because the cable likely only contains 50 pins and the SCSI card itself will be acting as the first terminator - there should be no reflections at this point. On a complex bus with multiple devices this could be a problem though.

I did look into proper 68 to 50 adapters however these are either very expensive or just not available. I guess their usage was fairly rare to start with.

Anyway, my HD68 to HD50 cable worked just. Lol just joking. Once again the simple things end up being the hardest. I bought an external cable, but the SCSI controller card has the HD68 connector on the inside of the card. It physically wouldn’t fit. So I took to the cable with a Stanley knife and cutters. Eventually it fit.

HD68 cable connected to the SCSI card in my PC. It’s sides have been cut off

Unit testing

It’s usually best to test each part of a complex system individually. My plan was to connect the scanner with SCSI on my Windows machine and do a scan. This was to test that SCSI scanning worked locally before moving to iSCSI.

I downloaded the latest drivers I could find for the SCSI card and promptly got a green screen of death. Followed by boot looping Windows. The driver installed the 32bit driver on a 64bit system. Windows was not happy. The simple things.

Windows green screen of death with a DPC Watchdog Violation

I tried some 64bit drivers. This installed ok, no boot loop however I got a This device cannot find enough free resources that it can use. error. I never worked out what was happening here.

Fine. I rebooted into Linux. It could see the controller and the scanner - huge win. But no one ever wrote Linux Xane drivers for this scanner. Fine I thought - I would run a Windows VM, pass through the PCIe device and run a supported Windows Server 2008R2 setup. Turns out in version 7 for VirtualBox they removed PCI passthrough. I’ll try virt-manager.

Virt-manager approach saw seabios hang on boot. I tried to get uEFI working but my Linux build was too old to have the required files. Do I even need to passthrough the entire PCIe device though? Maybe I just pass through the SCSI device. virt-manager GUI however didn’t show a way to passthrough the devices so I learnt how to build the XML myself. Start the VM and get permission denied. Turns out apparmour was unhappy about it. Fixed up apparmour only to find Windows Server build didn’t have drivers for the virtual SCSI controller. Arghhhhhhhhh. Lets give up with this.

Yolo

Someone in a forum helpfully suggested that rather than using virt-manager for SCSI passthrough just use iSCSI. So lets just do that (sans virt-manager). It shows up on the bus. It’ll be fine right?

SCSI cable running outside my computer

On my Linux host I started tgt service and configured the passthrough device. There’s some helpful documentation here. It comes down to these commands

tgtadm --lld iscsi --op new --mode target --tid 1 \
-T iqn.2001-04.com.example:test

tgtadm --lld iscsi --op new --mode logicalunit \
--tid 1 --lun 1 --bstype=sg --device-type=pt -b /dev/sg1
	
tgtadm --lld iscsi --op bind --mode target --tid 1 -I ALL

That’s it. The part I thought would cause me the most grief worked just fine.

Operating system woes

I decided that I wanted to use my laptop to connect to the iSCSI target. MacOS doesn’t have native iSCSI, so I decided I would run a virtual machine. Fired up Windows 11 in parallels however it seems like iSCSI has been completely removed in Windows 11. Maybe it’s the ARM build I have? It’s the simple things once again.

So I build a Windows 7 VM in UTM. Setup iSCSI Initiator. Excitement as the scanner shows up in device manager. Install VueScan and go to scan. Unlike before I can’t seem to do the magic to make the scanner happy about being loaded. Turns out the logic for checking that there is something loaded into the scanner is performed in software. I suspect that VueScan implements this logic different between SCSI and USB versions? Simple things.

Screenshot of Windows 7 iSCSI showing the SCSI scanner connected over the Windows iSCSI initiator
Interestingly the device list on the Windows iSCSI iniator does have the name as “Disk”, however I think this is more an issue with the tgt configuration? not sure.

Scanning

FS4000US scanner on the floor with cardboard holding the door switch down and a bank note inside the scanner

After a lot of messing around I eventually built a Windows XP VM along with downloading some sketchy DLL files. Wildestdreams has some third party software to run the scanner over SCSI. It doesn’t implement any media checking so will happily scan without being loaded!

Wildest dreams ScanApp showing the various scanner configuration options in a very basic app design

With that I was able to perform a scan over iSCSI. Of course the scan quality is rubbish because I’m not using film, the slide and haven’t calibrated the gain for each channel but the point remains it scans over iSCSI. Chris O - you’re wrong!

A partially scanned image being viewed on the scanner app

Windows picture viewer showing a scan, along with device manager and the Windows iSCSI initiator software

I did try the original Canon Twain driver - I think with the correct slide/tray it would work fine. It’s just very picky about the loading.

The original dream

The original dream was having a bunch of flat bed scanners connected to a disk array enclosures. While I can’t afford to see it happen, this experiment gives me hope. There’s no guarantee that it’ll work though as who knows what those iSCSI controllers are doing.


CODAN Selcall Part 1

We drive long distances in remote places. Sometimes we want people to be able to check-in on us while we are out and about. While cell coverage is getting better we still often find ourselves in places with no coverage.

Our 4x4 with large HF antennas mounted driving through a dry lake

Often we leave our HF radio tuned to a known frequency (usually 7045 kHz LSB) while we are driving and have VSC or power level squelch enabled. However this has false triggers. When driving through towns there is usually a lot of RF noise and even when we aren’t there’s noise from our car, other HF radios, or even just ionsondes or CW operators.

Random noises while driving are very frustrating. In the commercial space radios are fitted with selective calling (Selcall) functionality. This allows one radio to call another radio provided you know the ID number of the other radio.

The great thing about standards is that there are so many to chose from - Selcall being no exception. While there are many different Selcall variations today we are going to talk about the Codan specific Selcall - specifically the not the CCIR / UN / WA2 / RDD / Customs variants. Nor aircraft SelCal. The proprietary Codan “standard” is what we want to look at. The reason behind this is that there are numerous Selcall users on 7045kHz already using Codan Selcall and while it’s proprietary we can still interoperate with those users.

Modulating

The actual Codan Selcall process is actually pretty well documented with VK5QI already creating an open modulator including some of the less documented features such as channel test modes.

I decided to use freedvtnc2 as a skeleton project as it already has an FSK modem (Codec2), audio and rigctl components. From there I implemented the Selcall protocol while testing and researching some of the extra modes from our Codan 9323. Modulating a Selcall signal was now trivial - but we already knew and had that. What we really wanted was an opensource decoder. Something that could detect Selcalls.

Demodulating

Demodulating is a little bit harder, but given that we know what the signal looks like and Codec2 provides a good FSK demodulator it was not outside my skill level.

Codan Selcalls start with a preamble, usually quite long to allow radios scanning channels enough time to detect the signal. To an extent we can ignore the preamble as the Codec2 modem will handle locking onto the signal for us.

After the preamble there is a phasing pattern. This is a header that is at the start of every Codan Selcall. It’s nearly as simple as looking for this pattern to determine when a Selcall has started. But you don’t want to do an exact match as in HF radio its likely that noise has caused some bits to flip. I settled for a 85% threshold, however I think it could be even lower in practice.

Once the phasing pattern is found it’s matter of decoding the bits as they come in. There’s a few fun things in here. Codan uses a 7 bit word and 3 bits of parity, so you have to handle the incoming data as a series of bits rather than your typical bytes. Parity is only at the word level. There is no checksum on the entire message. The data is effectively sent twice (offset) so if parity fails on one word you might be able to recover it from the other.

The parity is a count of the number of 0’s in the word.

For my demodulator it works like this:

  • Store a rolling buffer long enough to fit a Selcall
  • Look for phasing pattern
  • If a phasing pattern is found try to decode each field - check parity - use the redundant field if required and available

Fun fact - if you are really unlucky both copies of a field could return a successful parity check but have different data - as there is no overall checksum for the message you won’t know which one is correct. When receiving a Selcall we just check if either field is our Selcall ID.

So you decoded a Selcall - what now?

Decoding the Selcall is only part of the process. We need a way of alerting the user that there’s a call for them. It would be really really really nice if ICOM allowed user defined messages (more on this later) and activating a beeper - but in the meantime lets aim for something simple. Rigctl.

When we send a Selcall we actually need to control the rig - turn the radio to data mode, trigger ptt, return the radio to voice mode. Since we already have rigctl for that, how about we use it for alerting the use as well.

The simplest solution I thought of is using the squelch adjust. On our rig you can configure the USB audio output to remain active even when the radio is squelched on the head unit. So the flow looks like this now.

  1. Squelch the radio to a fairly high level - unlikely to get false triggers
  2. Software listens for the Selcall
  3. When it receives a Selcall for our configured ID number use rigctl to unsquelch the radio
  4. (additionally) if the Selcall requested the channel test mode and the option is enabled send back a test signal

Freeselcall

I built all of this over a couple of days and have called it freeselcall. It’s had some over the air tests - but I wouldn’t say it’s battle hardened.

I realised that most people, myself included, wouldn’t want to use a terminal to perform selcalls so I’ve also added a web interface. It uses websockets so it should be easy to integrate with other systems as well.

Freeselcall web interface
(It even has browser notifications for those so inclined)

More than just Selcall

Freeselcall allows for sending and receiving Selcalls with various priority types along with the option for channel tests. However Codan radios can do more. They can do paging (short messages).

Today freeselcall can receive pages. It also has code to send pages however Codan decided to add some security words to process. As it stands a Codan radio will not receive the page messages from freeselcall.

Terminal prompt showing Codan 9323 boot up logo and some debug output

I’ve spent a lot of time in Ghidra along with building a Codan 9323 emulator (it can display the LCD boot up message!) to help reverse engineer how these security bytes work. Hopefully in part 2 I can share how the paging messages work. Emulating a unique flavour of 8051 along with the hardware it connects to has been fun.

The future (and a terrible idea)

As I alluded to earlier - being able to display a message on our ICOM 7100 would be ideal. However there’s no rigctl or CI-V way of displaying a message on the 7100. Or so I thought. It was only when I was dialing in a repeater on the 7100 that I came across an idea that might work. I haven’t tested this, but here’s the plan.

  1. Dedicate a memory channel for freeselcall.
  2. When a call comes, if the radio is in memory mode, store what memory the radio is in.
  3. Use CI-V command to save the current radio state as a memory in the predefined freeselcall memory channel overwriting any existing configuration in that slot.
  4. Update the channel name with the message / selcall id / ect….
  5. Tell the radio to load that memory
  6. After some time has passed and the radio hasn’t PTT’d in awhile, either restore the radio to the memory it was set to or turn off memory mode

I think this will work? But I have no idea until I try

The other addition I want to add is either implementing more sensible Selcall protocols or building a more modern robust one. This can run in parallel for backwards compatibility.

I haven’t even looked into 6 digit pagecall. I haven’t seen it used on amateur bands so I’m not overly interested.

A final note….

Part 2 won’t be coming out until I’ve either been able to emulate the Codan firmware (just because thats cool in its own right) or I’ve been able to figure out the magic for the secret words. The emulator is at the point where I need to have the base and the head unit emulated at the same time and talk via the I2C bus. This shouldn’t be hard, however it’s also not something that is trivial.

Please do not send me yet another a copy of the Codan CCIR 493-4 PDF. This afaik doesn’t contain any information on the secret words and I have enough copies of it from people being helpful.