Service Monitoring

Snoopy Pfeffer's picture

By Snoopy Pfeffer - Posted on 31 December 2013

Service Monitoring

Dreamland Metaverse uses highly sophisticated service monitoring tools, to ensure that your OpenSim regions and grids run 7x24 with best service quality.

We constantly monitor the following aspects of all OpenSim processes:

  • Process running    (not running -> restart)
  • Processor load
  • Memory consumption    (memory limit reached -> restart)
  • Port responses    (no port response -> restart)
  • OpenSim region status    (region issues -> restart)

If service monitoring recognize problems, OpenSim processes are restarted to fix such issues. Before each restart, all users online are warned of the imminent restart, 2 minutes before the restart, allowing them to leave the region.

Automatic Restarts

When a region restarts, logins are disabled until the region has fully restarted, including restarting all scripts. This is done do ensure maximum stability and reliability after region restarts. Depending on the amount of contents on a region this can take between 2 and 15 minutes for regions with much contents.

This means, relying on the automatic monitoring functionality we provide, downtimes of OpenSim regions are usually only some minutes and in very seldom worst cases, longest about 30 minutes.

Manual Restarts

In general automatic restarts do not happen often, even for busy regions. What we recommend is that you manually restart your OpenSim regions from time to time, about every 3 or 7 days, especially if you plan bigger events with many visitors.

You can restart your OpenSim regions using the estate management window of the viewer, our web based control panel or the in-world Dreamland Metaverse terminals.

Unmonitored Mode

On the control panel and using the in-world Dreamland Metaverse terminals you can start, stop, restart, monitor and unmonitor your OpenSim region.

We recommend not to stop your region, unless you are sure you want to do that, for example to restore a database backup. In grids like OSGrid there is the risk that someone else might take your region name or grid location, while your region is down, so keep the down times as short as possible.

We support a so called unmonitored mode, in which case the previously mentioned automatic monitoring is not done. The OpenSim region continues to run, but no problems are discovered and if the process crashes, it is not restarted automatically. This is the reason why you should not use the unmonitored mode for a longer time, without monitoring the region yourself, manually.

Loading & Saving OARs & IARs

The unmonitored mode is useful, if you load or save very big OAR or IAR files. Loading or saving such archieve files may require more memory than assigned to your region. Beside that processor load can be very high, for more than 15 minutes, what otherwise could cause an automatic restart. This is why we advice to temporarily switch your region to unmonitored mode while performing such functions.

Be sure to not forget switching on monitoring again after loading or saving OAR or IAR files.

Big Events

Unmonitored mode is also a helpful feature, if you plan very big events with many visitors. When you attend this event, you can monitor the performance of the OpenSim region yourself, manually.

Statistics Bar

When you do manual monitoring best open the Statistics Bar window, using the menu View > Statistics Bar. Check the Sim FPS and Physics FPS values. If these values drop to nearly zero, your region needs a restart, because avatars cannot move or their movements do not stop anymore. Also check if in-world chat still works or if you experience any other problems. Whenever necessary you can restart the OpenSim region manually.

The advantage is that in unmonitored mode your region can use more memory than the memory assigned to your process. This allows you to temporarily support more visitors, when you are close to the memory limit.

Processor load can be at the maximum nonstop, if some process threads got stuck. This will cause problems, but often it takes a while before people start to recognize these. Manual monitoring allows you to delay an otherwise automatic restart to sometimes later. This way you can often avoid inconveniert interruptions of your event.

Be sure to not forget switching on monitoring again after such events.