As explained in the article Dynamic secret management in Puppet, trocla was quickly set up by co-workers in the Puppet Claranet infastructure for the secrets management. Over time there has been several developments, particularly on the definition of the backend. Initially tested in memory, it went through several stages, file, MySQL, PostgreSQL, before remaining in a stable version, which lasted several years, on a simple MySQL in primary/replica replication. Although the database was very light, it contains more than 50k secrets.
In 2019, after many discussions, a co-worker wanted to test Vault, and potentially migrate the Puppet secret management there. We already had several goals which could only be validated by this migration:
A year goes by before we really relaunch ourselves on the subject, and we can start by listing the first points of difficulty that this migration represents.
After listing all these points, the task turned out to be more complex than expected. From this was born a working group of 3 people, each with their own expertise. We studied several solutions extensively, one of them was to simply add Vault as a new Trocla backend. As we did’nt have a ruby developper or a lot of expertise in the subject, it was not easy to accept a bad custom code on such a central element. It was decided to do the work and bring it to the community in order to have a code review. After that, the project was split into 3 main points and we each took the lead on one of them.
A first person with a senior lead tech profile, who participated on the initial Trocla implementation, had already taken the Vault subjet. First, you have to set up the architecture. We will start with a relatively simple Vault architecture with 2 instances balanced by Haproxy locally and data storage in the backend on an internal consul cluster. Eventually a migration to Kubernetes is considered. Auto-unseal is also in place with another Vault instance which is in a public cloud provider. Finally, performance and load tests are carried out to ensure that everything is ready to accommodate the Puppet traffic (which we had trouble evaluating to be honest).
As for the action plan, it was very simple, but required the interruption of operations for a day (we took a non-working day).
This was one of the most complex points for all of us. All Vault security access is based on token management, and the ideal solution is to have the shortest in time and the least used tokens possible. But it is not very compatible for DSL use compiled by Puppetservers. Indeed, the configuration and the initialization of trocla is done during the creation of the jruby instances, and it was unthinkable to have to re-make tokens at each call.
So, we chose to create a service token with the following security :
With that we need to created a token ttl rotation which I’ve set on the Puppetservers (which has the token service on the trocla configuration).
#!/opt/puppetlabs/puppet/bin/ruby
require 'yaml'
require 'vault'
lock='/tmp/trocla-token-renew.lock'
raise format('Lock file %s exist', lock) if File.exist?(lock)
File.open(lock, 'w') {}
troclarc = YAML.load_file('/etc/troclarc.yaml')
vault = Vault::Client.new(
troclarc.delete('store_options').reject { |k, _| k == :mount }
)
vault.auth_token.renew_self
File.delete(lock) if File.exist?(lock)
And after I just need to create a systemd timer with Puppet:
$trocla_renew = '/usr/local/sbin/trocla-token-renew'
file { 'trocla-token-renew':
ensure => file,
path => $trocla_renew,
owner => 'root',
group => 'root',
mode => '0500',
source => "puppet:///modules/${module_name}/trocla/trocla-token-renew",
}
-> systemd::unit_file { 'trocla-token-renew.service':
ensure => 'present',
content => template("${module_name}/systemd/trocla-token-renew.service.erb"),
}
-> systemd::unit_file { 'trocla-token-renew.timer':
ensure => 'present',
content => template("${module_name}/systemd/trocla-token-renew.timer.erb"),
enable => true,
active => true,
}
trocla-token-renew.timer.erb
:
[Unit]
Description=Timer to run the vault token renew
[Timer]
OnBootSec=15min
OnCalendar=*-*-* <%= format('%02d', @hour) %>:<%= format('%02d', @minute) %>:00
[Install]
WantedBy=timers.target
trocla-token-renew.service.erb
:
[Unit]
Description=Script to renew the trocla vault token
ConditionFileIsExecutable=<%= @trocla_renew %>
After=network.target
[Service]
Type=oneshot
ExecStart=<%= @trocla_renew %>
Nice=10
IOSchedulingClass=best-effort
IOSchedulingPriority=5
[Install]
WantedBy=multi-user.target
The day of the migration everthing goes well. A few keys with specific cases had been identified. From a key with a special character like /
or with too long key names, we have treated them on a case-by-case basis. The import ended the same way as the preproduction (in about 20min), but unfortunately when testing the content on a few nodes, we quickly saw that there was something wrong. Some secrets change. It didn’t take long to realize that during the migration we had imported the pre-production data set, from a backup of several weeks.
The setback was anticipated in the action plan, but during the destruction of the kv we had a problem. The kv had to much content and failed to destroy itself correctly. It was after many searches on Github issues that we found the solution. We changed the value of the default_max_request_duration parameter so that consul finally has time to delete all the contents of the kv.
The import was then redone correctly and we were able to finish the migration.
After the migrations we were able to extract more metrics from our Trocla infrastructure. It is with surprise that we observed 2M calls per day on the Vault kv (for 10M calls per day from the Puppet API) and that the integration of an additional https layer had no effect on compiling Puppet catalogs. On the contrary, Vault turned out to be a better backend than MySQL (we are talking about a few ms of gain).
I then added the vault key expiration implementation on merge request #71 (with a little fix in #80), allowing us to have secrets renewal in a very simple way while keeping an accessible history through the Vault webui.
Finally, unfortunately, we are still waiting for our cases of Trocla Nera wine from our (old) managers.