Sunday, December 8, 2013

A primer on runit using Debian Wheezy - process supervision

Runit is a program used to make process supervision and as an alternative for startup scripts.
For the purpose of our example, we're are not going to use runit as a compliment to init.d.

Why should I use it?

It's easy to use.
It's multiplatform.
It allows to to avoid having to fight with pid files.
Helps you to avoid reinventing the wheel.

Installing runit

aptitude install runit
The installer will add the following two lines at the end of your /etc/inittab:
After installing, you'll have to options to start using runit: reboot o reread your /etc/inittab file by running:
init q
From now on, every time we start your OS it will run the runsvdir daemon. This process will be responsible for monitoring the /etc/service/ directory where we will configure our services. By each service running on this directory, the runsvdir daemon will spawn a new runsv process.

Configuring our first service.

Create the service directory
mkdir -p /etc/sv/test
Create the service script and the script to log everything that happens with the service:
touch /etc/sv/test/run /etc/sv/test/log/run
Give running persmission. This is very important. Otherwise the services won't run:
chmod u+x /etc/sv/test/run
Our really simple 'service' that writes to stdout and stderr every and die after 5 seconds:
exec 2>&1
exec bash - <
Note: We need to use exec in order to make our command replace the current shell without creating a new process. Content of /etc/sv/test/log/run:

exec chpst -u nobody svlogd -ttt /var/log/test/

We have our services configured but not under supervision. We need to create a symbolic link to the /etc/service/ directory:
pushd /etc/service/
ls -s ../sv/test/

Show time

Everything its setup. Let's do some test:
# start the service
root@beta:/etc/service# sv start test
ok: run: test: (pid 26967) 0s

# lets check our service during 5 seconds to see how it behaves
root@beta:/etc/service# sv status test
run: test: (pid 26972) 0s; run: log: (pid 2643) 25259s
root@beta:/etc/service# sv status test
run: test: (pid 26972) 1s; run: log: (pid 2643) 25260s
root@beta:/etc/service# sv status test
run: test: (pid 26972) 2s; run: log: (pid 2643) 25261s
root@beta:/etc/service# sv status test
run: test: (pid 26972) 3s; run: log: (pid 2643) 25262s
root@beta:/etc/service# sv status test
run: test: (pid 26972) 4s; run: log: (pid 2643) 25263s
root@beta:/etc/service# sv status test
run: test: (pid 26972) 5s; run: log: (pid 2643) 25264s
root@beta:/etc/service# sv status test
run: test: (pid 26984) 0s; run: log: (pid 2643) 25264s
# the service got restarted automatically!!
root@beta:/etc/service# sv status test
down: test: 5s, normally up; run: log: (pid 2643) 25276s

# now we can stop out service
root@beta:/etc/service# sv stop test
ok: down: test: 1s, normally up

This is it for this first post on runit.

Tuesday, September 3, 2013

Simulating a filesystem with not space left

The situation:

This may sound curious, but yes I had to simulate a filesystem with no space left in order to reproduce a failure that caused problems with the uploads on an PHP application. The upload tmp dir got full and the application stopped uploading files.

The application was running on a VM and there was no LVM. This meant that I would had to add a disk to the VM, restart de VM, create a new partition... It was to many steps for a simple test. Besides that, I wouldn't use the partition again after the test.

The Solution:

Create a filesystem on a file, use it as a loopback device and fill the filesystem. I thought loopback devices were useful just for mounting ISOs and disk images, but no, they became really useful on this situation:
That's it. I updated the upload_tmp_dir PHP parameter to:
upload_tmp_dir = /mnt/php-full-device

The developer corrected the bug and I didn't have add another disk to the VM (I know, I'm lazy).

Bonus Track:

Here I leave some additional commands that might be useful:
As you may appreciate this is easy to implement and doesn't require any server restarts.

Saturday, February 16, 2013

Logstash - GROK patterns and nginx access log

I've using logstash for over a week now and I think its a really good tool to put some order on your infrastructure. There are too many file formats, protocols... and not enough time write the required amount to regexps or parser that may allow you to understand whats going on on your platform.

Logstash provides an abstraction layer for complex regexps called grok. It allows you to convert "streams" into fields (chunks of information) that may be queried later. We are going to parse nginx access logs using grok.

Hands on
In my case, I had to start learning GROK (grok debugger helped me a lot) in order to parse an nginx ( 0.7.67-3 on squeeze ) access log. So here is the logs file format (taken from slicehost)

If you need some more detail on nginx file format and so, check this.

Here is an example nginx log file:

Now create a directory called "patterns" and create a file "nginx.grok", and put the following pattern there:

I used the same var names used on the nginx site to create the field names. So it will be easier to track the information into your output.

Now lets put all this to work into logstash:
And here is the resulting json data as shown in the stdout:

The date

Something really important to remember its that logstash stores all the events on GMT using "@timestamp". We sent the event on the "time_local" field, and  using the "date" filter we told logstash to use that field as its timestamp.

You may ask, Why do logstash changes my events time? The reason its simple, it will allow us to make event correlation among boxes on different timezones.

Here is what I sent to logstash on nginx $local_time format:
"time_local":["16/Feb/2013:12:30:20 -0430"]
Here is what logstash stored on ISO8601 format:

As you may notice, there is a time difference of 4 hours and 30 minutes. And the reason is that the linux box time zone is "America/Caracas", which is -4:30 from GTM.

I hope this may be useful for you, enjoy!