CloudLab: 2015

Friday, December 4, 2015

OpenStack Profile and Image Changes and Updates

We've updated both the OpenStack profile and the images it's based on, and wanted to point out some important changes -- and encourage you to migrate to this latest version insofar as possible.

First, we've changed the way OpenStack packages are installed and/or upgraded by the scripts. Originally, for several reasons, we wrote the profile's setup scripts to always install the latest version of the OpenStack packages. That no longer makes sense, because 1) the Ubuntu OpenStack packages are pretty stable right now; and 2) we can (and now do) provide a profile parameter that allows you to upgrade to the latest packages if you like. However, by not updating the pre-installed packages by default, we can significantly reduce the load the scripts put on the Ubuntu package mirrors -- and more importantly, provide a more stable profile that isn't a moving target. This more reasonable default is long overdue; thanks for waiting for it. There are now three options that affect package installation on the nodes in your OpenStack experiments. The first is new; the latter two were previously present.

Upgrade OpenStack packages and dependencies to the latest versions
If a package is already installed, we don't try to upgrade it to the latest version unless this option is selected. The default is false (unselected).
Install required OpenStack packages and dependencies
If this option is false (unselected), the setup scripts assume all required OpenStack packages are installed, and it only installs critical dependencies that it absolutely requires for its own execution. By default, of course, this option is true (selected). If the "Upgrade" option detailed above is false, and OpenStack packages are already installed, nothing will happen. If packages are not pre-installed, if this option is selected, they will be installed.
Update the Apt package cache before installing any packages
This option gives you control over whether or not the scripts update the Apt package cache before doing any package installation/upgrades. Typically, it's a bad idea to not update the cache, as you may end up trying to install packages that are no longer present on the mirrors; but this gives you that choice just in case.

Second, we've updated the Cloudlab software installed on the images the profile uses, to pull in some important updates. When you pick Kilo as your OpenStack version, the Ubuntu 15 images your experiment will use now support swap partitions on x86 (most Cloudlab images have a swap partition in the standard partition layout, of course, but the right systemd helper services were not installed on Ubuntu 15). The ARM-specific images (usable only at Cloudlab Utah, our ARM-based cluster) use our more modern partition layout, so instead of a whole-disk root partition, there is space for you to create partitions. In particular, the setup scripts create a secondary partition as a backing store for an LVM physical volume. The ARM-specific images now also have an important Cloudlab software update that allows the system boot initramfses to be properly regenerated (important if the kernel is upgraded, or if the installation of some package triggers the automatic rebuild of the initramfs). Finally, all images have the most recent Ubuntu package versions of OpenStack Kilo and Juno pre-installed (unless you select the option to start "from scratch", which deliberately uses images that don't have the packages pre-installed).

Third, we've improved the profile's documentation a little bit. When you swap in an experiment, you'll see a markdown rendering instead of the giant glob of text. Hopefully this will make things more clear -- although we didn't attempt to document everything.

Finally, we've disabled non-current versions of this profile, meaning that you cannot instantiate nor copy those profile versions. Please instantiate using the latest version.

Thanks for reading, and please report any problems to cloudlab-users@googlegroups.com (if you're not a member, you should join!).

Friday, November 6, 2015

Changes to the default OpenStack profile

We have three announcements to make today regarding the default OpenStack profile in CloudLab:

The OpenStack profile now randomly generates a new password for every experiment. This password is used for the 'admin' login in the OpenStack web interface and for password ssh logins to VMs created by OpenStack. The password for your experiment can be found in the "Instructions" panel in the experiment status page.

OpenStack profile instructions showing admin/root password
It's now easier to tell when OpenStack is done setting up. The Topology view of the experiment status now has little icons on each node showing the status of the scripts that set up that node. Now, when OpenStack setup is complete, all of these icons will change to checkmarks, and the 'State' of the experiment will change from "booted" to "ready". It's normal for the control node to take much longer to finish setting up than the compute or network manager nodes; it has a lot more work to do.

Profile topology view showing three ready nodes

We are deprecating the "Tutorial-OpenStack" profile in favor of the "OpenStack" profile. The "OpenStack" profile covers all features offered by the tutorial version, and more. We are not deleting the Tutorial-OpenStack profile at this time, but it is no longer selected by default, and we do not encourage people to use it for new experiments.

Friday, October 23, 2015

Important Tutorial-OpenStack and OpenStack Profile Change

For security reasons, we have changed the CloudLab-provided OpenStack profiles to modify the way root login is handled on the VMs brought up by OpenStack.  This also affects most other OpenStack profiles on CloudLab (those that use our OpenStack setup scripts). Note that this change does not affect CloudLab profiles that do not use OpenStack.

Password login is no longer allowed for the "root" account. If you need to log in directly to your VMs as root, you will need to use an ssh keypair. Password login is still allowed for the 'ubuntu' account, so if you do not have a keypair set up, you may use that account instead.

Tuesday, August 18, 2015

New and Improved OpenStack Profile

We've developed a new and improved OpenStack profile. It has evolved from the Tutorial-OpenStack profile referred to in the CloudLab manual, but uses newer CloudLab features (geni-lib scripts, profile parameters, and multi-site ability), and exposes many more OpenStack configuration options. Our OpenStack profiles all use stock Ubuntu OpenStack packages insofar as possible to minimize experiment instantiation time. Here's a brief summary of the new features; more details follow below.

Use profile parameters to easily change the number of compute and network resources in your experiment, or control its OpenStack configuration
Choose Kilo on Ubuntu 15, or Juno on Ubuntu 14
Add computes nodes at a second CloudLab cluster, using CloudLab's beta support for multi-site experiments
Try different Neutron network configurations (flat, GRE-tunneled, vxlan, or vlan-based networks)
Use CloudLab's support for creating multiple experiment network links atop a single physical device
Better control the management network (choose VPN over public CloudLab control net, or over experiment net (possibly shared atop a single physical NIC with other networks in your experiment))
Configure several OpenStack features (remote serial console access, security groups, etc)
Use "bare" Ubuntu CloudLab images on your physical nodes (or your own custom images), without OpenStack packages preinstalled, or use images with most necessary software preinstalled (the default) to speed up experiment creation

First, the profile is no longer just a large RSpec description of the experiment; it is now based on a geni-lib script (read the CloudLab documentation on geni-lib for more detail). geni-lib scripts are Python scripts that output an RSpec description of your experiment. Using geni-lib classes, you programmatically describe the resources you want in your experiment and configure them---i.e., add nodes, create LANs and links between them, and install software or scripts on them. When this geni-lib-based python script is run, it will print out an RSpec that describes your experiment.

Second, the profile's geni-lib script makes liberal use of CloudLab's profile parameters. Parameters can be set by the user of the profile when creating an experiment, and different values can cause the geni-lib script to produce a new, different RSpec. Of course, each parameter has a default value, so if you don't change any defaults, your experiment will be created using the RSpec generated by running the script with no input parameter values.

If you look at the profile's source (not the RSpec, but the geni-lib source), you'll see a Python script. It may seem complicated, but much of the complexity is caused by its multiplicity of parameters! It's commented, so you can look through it, but its basic flow is to 1) define input parameters, default values, and help docs; 2) process any input parameter values and generate errors and warnings as needed; 3) set a description and instructions that are shown to the user at experiment creation; 4) create objects describing experiment resources (nodes, LANs, public IP addresses, etc); 5) add a special "parameter" geni-lib resource, to send several parameter values to the scripts we install on the nodes to change their behavior; and finally, 6) print the RSpec!

When you create an experiment using this profile, after your nodes have booted, each node runs a shell script that enables secure, peerwise root ssh, so that the root user, on any node in your experiment, can ssh to any other node. The network manager node ("nm" by default) then connects to each node and 1) configures the management network, and 2) sets up an openvswitch configuration by placing the correct physical network devices into openvswitch bridges. (Since some CloudLab clusters have nodes that provide up to 5 experiment network interfaces, the physical ethernet devices in these bridges may change, even if you create a second experiment with the same parameter values---but the shell scripts that set up OpenStack deal with all this for you.) Finally, the shell script running on the network manager ("nm") node connects to the controller ("ctl") node, and begins setting up all the OpenStack services, which itself involves additional configuration for the network manager and for each compute node.

Due to the extra OpenVSwitch-based configuration these scripts perform, over and above the default CloudLab configuration that is applied to each experiment, you cannot currently snapshot experiments based on this profile --- if you do, and create a new experiment based on your snapshot, your experiment networks will almost certainly be misconfigured. If there's a lot of interest in this, we may work to make it possible.

Much of the configuration in these shell scripts comes from the OpenStack instructions for Apt-based Linux distributions, and we hope that makes this profile easy to modify if you need. Just download the tarball referenced in the profile's geni-lib source code, unpack it, and modify the scripts as you'd like; then create a new profile that uses your tarball instead of the default. Or, if you'd like to propose a new feature or configuration, you can ask on cloudlab-users@googlegroups.com --- we can't promise to accommodate your request, but we might try. Please do report bugs to that mailing list, and we'll do our best to fix them.

Wednesday, July 29, 2015

New Feature: Console Logs

Many users have requested that they be able to access the console logs, back to the beginning of their experiment. This can be handy, since the console shell may not have what you are looking for if it happened sometime in the past.

On the status page, there is a new context menu item:

When you click on "Console Log" a new window (or tab) is created, and after a few seconds the console log is inserted into the new window. The delay might be as long as five seconds, take a sip of coffee while you wait.

Note that the log does not update in real time, nor can you refresh the window because of how the access security is handled. You need to go back to the topology and the context menu to request a new download. If this becomes a problem, please let us know and we will see if we can do something about it.

Thursday, June 25, 2015

Using OpenFlow in CloudLab

OpenFlow support in CloudLab

There are options for both software and hardware OpenFlow experimentation using the CloudLab testbed platform.

Part 1: Software OpenFlow:

A good way to get started with OpenFlow is by using Open vSwitch (OVS). The following GENI tutorial shows you how to setup a handful of virtual machines connected by OVS:

GENI: Intro to OpenFlow

The Rspecs referenced in the above OVS tutorial document should work on the APT cluster available through the CloudLab portal.

Part 2a: General Hardware OpenFlow Information:

If you don't have a concrete reason for using hardware OpenFlow, then we suggest that you stick with a software implementation. There are several things you must take into careful consideration if you believe hardware OF is necessary in your experiments:

* Not all OF matching / actions are handled on the switch fast path.

If you are unsure what is meant by "switch fast path", please take some time to study switch hardware architecture before going any further. Typically switches can use the fast path to match (via a TCAM or other hardware look-up mechanisms) on OF OXMs that correspond to typical Layer 2 packet switching functions. E.g., on Ethernet addresses and VLAN tags.

In the case of the HP FlexFabric switches in the CloudLab Utah cluster, the hardware fast path capabilities are enumerated in the JSON info dump linked in below, found under the section for the table with ID 100. This table corresponds to the "MAC-IP" table in HP's documentation. The FlexFabric switches also support what the documentation calls an "extensibility" table, which has ID 200 in the JSON dump below. This table is more flexible, but is at least partially handled by the switch's general CPU (slow path). Since the slow path can introduce jitter and latency, depending on current load, using the extensibility table is discouraged. In fact, it may be possible to bog down the switch's CPU and impact other slow path functionality.

* Not everything in the OF spec is supported by any particular switch platform.

Virtually all standards have optional elements. OpenFlow is no exception. Switches can support a particular version of OpenFlow, but almost certainly will not support all aspects of it. For this reason, we highly recommend reviewing the supported match and action rules for each table type that a switch platform supports. For the HP FlexFabric switches deployed in the Utah CloudLab cluster, please refer to the JSON dump linked in below.

* OF controllers have bugs and shortcomings.

When domains on a switch (e.g. vlans) are setup to be managed by an OF instance, it is up to the controller to properly direct traffic. When problems arise, resist the temptation to conclude that the switch must be at fault. For example, the "simple switch" application that comes with the Ryu controller is extremely limited. It does not properly check for the existence of tables at a particular ID (assumes ID 0). It does not check to see if tables have support for the OXMs and actions in flow rules it tries to push. It does not properly handle multi-switch topologies that have a fan out greater than two. Finally, it does not properly handle or even report many errors that attached switches send along.

We do not endorse any particular OpenFlow controller implementation. It is up to you to find a suitable controller and ensure that its implementation conforms to the OpenFlow version that the switches support. You must also configure or modify it to operate with the parameters setup by CloudLab. See the next section for details on the Utah CloudLab cluster's switches and how they are configured for OpenFlow.

We ask that you capture and analyze the traffic between the OF controller and the switch(es) when there are issues. If you run into something that you believe is genuinely the fault of the switch, please send your gathered evidence along to the CloudLab operations list (support@cloudlab.us). You should be able to demonstrate the issue by walking us through a representative packet capture.

Part 2b) Using hardware OpenFlow at the Utah CloudLab cluster:

Note: Please read the previous section before proceeding.

The "CloudLab Utah" cluster offers OpenFlow access to the physical switches (OF instances per vlan). To request this, you can add the following directive to the "link" specification in your Rspec to create an OF instance, which will be pointed at the controller you specify:

<emulab:openflow_controller url="tcp:CONTROLLER_IP:6633"/>

In-line example:

<link xmlns:emulab="http://www.protogeni.net/resources/rspec/ext/emulab/1" client_id="lan-1">
    <interface_ref xmlns="http://www.geni.net/resources/rspec/3" client_id="interface-0"/> 
    <interface_ref xmlns="http://www.geni.net/resources/rspec/3" client_id="interface-1"/> 
    ... 
    <emulab:openflow_controller url="tcp:CONTROLLER_IP:6633"/> 
</link>

Substitute "CONTROLLER_IP" with the IP address of your OF controller host. Adjust the port (6633 above) as necessary.

* Notes on HP FlexFabric (Comware 7) switches and how CloudLab configures OpenFlow:

They are OF 1.3.1 compliant. Note that this does NOT mean they implement everything in the OF 1.3.1 spec! See the JSON dump below for specific details on what OXMs can be matched, and what actions are available in which tables.

When you ask for OpenFlow as above, CloudLab will set up a "MAC-IP" hardware table at ID 100, and an "extensibility" table at ID 200 for your OF instance. This is important to keep in mind because some controller software assumes that a table exists at ID 0.

You must use a publicly accessible (routed) IP address for your controller. CloudLab cannot currently setup a channel in the private experimental plane for your controller-to-switch communication. You can make use of a CloudLab node to run your controller on by first instantiating a single node profile, setting up your controller on it, and then instantiating another profile that requests OpenFlow. You would use the public IP address of the instantiated controller node from the first profile in the "openflow_controller" url field in the RSPEC of the second profile.

The following document has detailed information, dumped from Floodlight, showing the OF capabilities of CloudLab Utah's Comware 7 (HP FlexFabric) switches:

HP FlexFabric (Comware) feature dump (JSON)

Thursday, April 16, 2015

New feature; Snapshot in a multi node profile.

Hi all. We have added a feature to make it a bit easier to take snapshots within a multi node profile. This feature is accessed from the per-node context menu; click and you will see a new option in the menu, "Snapshot".

If you select that, you will be shown the following modal:

The first check box asks if you want to update your profile after the snapshot has completed, by changing all of the nodes that are running the same image, to use the new image. In general you want to leave this box checked, unless you want to edit the source code yourself.

If you have several nodes in your profile, all running different images, and you want to take snapshots of, each one you have to do each one individually via the context menu for each node.

The second checkbox is to address the problem that many users have with accounts and groups getting deleted when you take a snapshot. This typically happens when you install a software package that adds a user and/or group. Those get deleted by default, but if you check the box above, they will be retained permanently (you don't need to check the box the next time you take a snapshot). Note that your user account is always deleted no matter what, to encourage you not to build in dependencies on specific user directories. :-)

The reason this is not the default, is because in general you do not want to take the chance that an account with a bad password will be propagated forward to every time you create a new image.

As always, comments welcome we always want to hear what people think.

Persistent Dataset

Hi all. So here is an example of how to create and use one of our persistent dataset types; an image backed dataset. This is a dataset that you create using a temporary file system, capture, and then reuse in other profiles. Each time you use it, you get a private copy on your node(s) that you can read and write, but which is thrown away when your experiment ends. Note that you can update the "master copy" of the dataset later if needed. Hopefully this html with images will come across okay!

Step 1: Start an instance of the public Temp-Datastore profile. This example profile will provide you with an extra 30GB file system mounted at /mydata.

Step 2: Login to your node (or use the shell in your browser) and populate /mydata. When you are done, log off of the node so that the data can be captured cleanly.

Step 3: Go to Actions->Create Dataset. Choose a name for your dataset and which project to associate it with. Click on Image Backed dataset. From the drop down menu, choose the Temp-Datastore experiment that you started up in Step 1. You will see the next three fields filled in automatically:

Step 4: Click create. There will be a delay of a few seconds, be patient! In a bit you will see the image capture progress modal. When the image capture process is completed you will see the following (and you will receive an email message):

Step 5: Dismiss the modal, you will see the details of your new dataset. The most important bit of data is the "urn" of your dataset. This is what you need to use your dataset in another profile.

Step 6: Now make a copy of the Image-Dataset public profile. On the edit page for your new profile, click on the Source button, and then replace the urn in the blockstore tag with urn of your new dataset:

Step 7: Click on Accept and then Save your profile changes. Then click on Instantiate. When your experiment has setup, you can log into your node, and cd to /mydata to see your data.

Step 8: If later, you want to update the contents of your dataset, then go back to the Show Dataset page shown in Step 5, and click on Snapshot.