Merging a Cisco NX-OS SAN with an IBM Brocade SAN (trying to use NPIV)

Wednesday, 18 Sep 2019

SAN to the Future

Storage Area Networking (SAN) is something I'd guess most Network Engineers have heard of, or some limited exposure, but not much; maybe you've done some zoning for the Storage Guys on your Cisco N5K boxes, but otherwise it's a bit of a dark art. Well, same here - but recently I was posed an interesting problem, that in the IP/Ethernet world, is a fairly trivial undertaking:

Can we merge our IBM SAN with our Cisco/Hitachi SAN, so that Servers on one can access Storage on the other, and vice-versa?

Ever the idiot optimist, I immediately responded "Sure, that's like 10 minutes of work or something right?", and so dear reader, we begin.

Being prepared (FC Learnings)

Optimistic as I am, I've been burned by playing with stuff I dabble in before. So a hasty £4 transaction was made on fleeBay to procure this fine tomb of knowledge from the early 2000's:

undefined

I can highly recommend this book. A few bedtime reading sessions later, and I've already learned an awful lot more about Fibre Channel (FC) and undone some misconceptions I'd brought in from the IP/Ethernet world, like:

  • A Fibre Channel Domain (collection of FC Switches interconnected) can only work if each Switch has a unique FCID
    • By default, like VLANS, this is FCID 1
    • Two FCIDs of 1 on the same FC Network ("Domain") mean you're gonna have a bad time (one of the FC Switches will be "segmented" from the rest of the world)
  • A SAN Fabric is the collection of Switches in an FC Domain
  • HBA is a Host Bus Adapter (for FC)
    • This is the NIC of the FC world
  • CBA is a Combined Bus Adapter (for FCoE)
    • This is a NIC, but now it's also a HBA (the "C" refers to the fact that the same physical port is both a HBA and a NIC)
  • Normally, there are no more than two SAN Fabrics (A and B) per Deployment of a given set of Compute/Storage Array
    • But each SAN Fabric (i.e. the A Leg or B Leg) could have lots of FC Switches within it, and a Hub-and-Spoke setup, where the "Core Switch" is an FC Director-class Switch, and the "Access Switches" are Pizzabox-like FC Access Switches
    • "Ghostbusters Rule" applies here, the two streams (A Fabric and B Fabric) must never cross/talk to each other
  • Fibre Channel comes in 1, 2, 4, 8, 16 and 32 Gbps speeds, typically called "<x> GFC" (i.e. 8 GFC is 8 Gbps Fibre Channel)
    • Cisco N5Ks only go up to 8 GFC; I'm convinced 16 and 32 GFC are unicorns
    • Each is their own OSI Layer 1/2 Protocol pairing, although my brain approximates them to equivalent-tier on the OSI Model to, say, 1 Gbps Ethernet vs 10 Gbps Ethernet (i.e. an 8 GFC SFP will normally be backward-compatible for 1/2/4 GFC as well)
      • There's some optical magic where the OTU/OTN "encapsulating wavelength" is the same for, say a 8 GFC SFP as a 10 GbE SFP, it's just that an 8 GFC SFP "wastes" the 2.5 Gbps of this bandwidth (the world of optical is made up of 1.25 Gbps Wavelengths it seems)
  • FC uses an IS-IS/SPF-like algorithm to construct a Network Tree and block redundant paths
    • A large Blue/Red-hatted company who trIed Bloody hard to iMplement this on one of our SANs had completely misunderstood this, and thought that 4x 8 GFC uplinks makes 1x 32 GFC uplink
    • You can typically see which is active on, say, Brocade kit by looking at the "(upstream)" or "(downstream)" flag against a "fabricshow" or "switchshow" command
  • FC Interswitch Links are called ISLs
  • FC has sets of features - such as the FC Name Service - not all manufacturers/products support all features
    • This is hard to swallow, as it's a bit like Cisco and Juniper still competing on commonly-done features, at the "Ah yeah, we do Ethernet, but not with STP as an option" level (i.e. you can't take FC features for granted between vendors/products like you can in the IP/Ethernet world)
  • FC has various terms for the types of port (much more than "Access" vs "Trunk")
    • E_Port is a Trunk between FC Switches or Nodes
    • F_Port is an Access towards a Server or Array
    • N_Port (on the HBA) is the Server/Array Port towards a Switch N_Port
  • All FC Switches in a Domain can see all others and know the topology
    • On Brocade FOS, you can quickly get this with the following CLI (which looks like a reversie of Cisco IOS, with the space between keywords removed):
      • switchshow
        fabricshow
  • All Zoning/LUN/Fibre Channel Database Login ("flogi") information is held in the Fibre Channel Name Service (FCNS), which each FC Switch automagicaally populates with other FC Switches as soon as it is updated on any one FC Switch
    • I like to think of this as to FC Zoning Database what VTP is to VLANs in the IP/Ethernet world
  • World Wide Names (WWNs) are the equivalent of a MAC Address
    • Some are for the physical Port, others are for the Node (Switch/Server/Storage Array) itself
    • As well as the OUI-like "Vendor Identifier" concept on MAC Addresses, WWNs have a "Usage Identifier" to show if that WWN belongs to a Server or Storage Array
  • Logical Units (LUNs) are the name for Virtual Disks, which the Storage Array abstracts away onto multiple Physical Disks for redundancy
  • Everybody calls it a SAN Array although really it's a Storage Array
  • Fibre Channel over Ethernet (FCoE) is it's own thing, and aside from using the same Ethernet Medium/Cabling, can be viewed as a compete foreigner hitching a lift on the last-mile bit (i.e. Server-to-Switch) on the IP/Ethernet Network
    • FCoE requires a host of other stuff, like DCBX (Adapters that can negotiate FCoE parameters/Switches that can do something useful with the Ethernet "PAUSE" frame, rather than ignoring it; QoS parameters that prioritise FCoE frames...)
    • There's a reason FCoE never really took off (it's a pain in the arse to do right, even more than FC)
  • Targets (i.e. where the Storage LUN lives, the Storage Array) can't live on the same N_Port as an Initiator (i.e. the Server wanting to put/pull from that Storage LUN)
  • VSANs are another level of abstraction (unnecessary for most) where you can have a VSAN act as a container to a SAN, which in turn has Zones, which in turn only allow certain FC Aliases (human-friendly names for WWNs) to speak to other certain FC Aliases/WWNS
  • Everything in FC Zoning configs is an Inception-style "mapping to something else, which maps to something else" that only ends when you swallow the blue pill

Applying the theory to reality

Now armed (and definitely dangerous), let's look at what it is we've got in terms of the two SAN Fabrics to merge today, focussing only on the "A Leg" (for visual simplicity, but the same exists again for a "B Leg"):

undefined

If you're not familiar with an IBM FlexSystem/PureFlex Blade Server, think of a Cisco UCS but with much less functionality. For those of you unfamiliar with the world of the Blade Switch (you lucky, lucky people) - it's a module within the Blade Chassis that takes power/hosting from the Chassis, and has some ports on it as invisible internal ports (i.e. maybe Eth1/1-48 map 1:1 to the respective Backplane NIC on each Server in Blade Chassis Slots 1-8 - so Eth1/3 on Blade Switch #1 is NIC0 on Blade Server #3), and other ports on it as physically-connected uplinks (i.e. maybe Eth1/9-12 are 4x 1 Gbps Uplinks to the Top of Rack Switch, via 1000BaseSX Multimode Fibre patch lead).

Relevant for NPIV/NPV (when we get onto it), the IBM FlexStor V7000 is an in-Blade Chassis Storage Array, which utilises some of the Blade Chassis Server Slots, but acts as an FC Target (Storage) rather than a typical Blade Server Compute Node (as an FC Initiator, Compute Server).

As with many things in Large Enterprises, the cool kid unicorns don't exist here; is it daft that we've got two distinct Data Centre Stacks (one IBM and one Cisco/Hitachi) from each other? Absolutely. Would a cool kid hipster DevOps tell me this is impossible in the real world? Probably. Is there a technical reason for it existing? Not at all. Why is it there? Big Company politics and Project silos.

On the IBM kit; it's all re-badged Brocade, running Brocade Fabric Operating System (FOS), namely:

  • IBM SAN24B = Brocade 300
  • IBM FC5022 = Brocade 6547

IBM make this hard to discover, for some reason; I can't think why their Customers have left them in droves since the early 2000's, everyone must be wrong.

Raising Vendor TACs

Looking at the above, you're probably thinking - "Not too hard then, cable up some OM3/OM4 8 GFC from the IBM SAN24B to the Cisco N5K, job done?". Sadly, no - there's a few pre-requisites we need to do; so I'll leverage the expensive IBM-side and Cisco-side Technical Assistance Centre (TAC) Contracts I've got, and check my back. Caveats I'm aware of are the uniqueness of the FC IDs, so I go around and do the following to glean these:

  • IBM/Brocade
    • Login to each Switch via SSH/Telnet, and issue the following to glean the FC Topology, FCIDs and SFP Inventory/Status for each Switch
      • fabricshow
        switchshow
        sfpshow
    • Record them all in a big ol' spreadsheet
      • Including the Hostname, which handily for me, Big Blue have made completely different from the sticker on the front of the kit/documentation; thanks for that, IBM - again, it *really* hurts me that you're slowly going under in the Cloud Era, I can't think why your Cloud offering isn't even on the leaderboard...
    • Pull out the FC Alias (human-friendly name:WWN) and FC Zoning information
      • alishow
        zoneshow
    • Record this all in a big ol' notepad
      • Because I think I might have to transpose this into Cisco NX-OS/SANOS syntax
  • Cisco
    • Login to each Switch via SSH, and issue the following to glean the FC Topology (not much, there's 1x N5K per SAN Fabric), FCIDs and SFP Inventory/Status for each Switch
      • show fcdomain
        show vsan membership
        show inventory
    • Record them all in the same big ol' spreadsheet
    • Pull out the FC Alias and FC Zoning information
      • show flogi database
        show zoneset active
        show zone
        show fcalias
    • Record this in a big ol' notepad
      • To get the syntax I need to translate into (Brocade FOS -> Cisco NX-OS)

With Vendor TACs in progress, I go around and complete the above, and am happy that the FCIDs are unique on each FC Switch, so a SAN Merge isn't going to cause a problem. Having read this fantastic blog post on Merging Brocade SAN Fabrics, my understanding is that the SAN Fabric with the highest  (in ASCII terms, so "Z" trumps "A" for instance) Effective Configuration name (Brocade speak, or "Zoneset Name" in Cisco speak) wins/goes active. As I want to minimise the outage, and have the Cisco N5K "win" as the FCNS Master, my thinking is:

  1. Convert all the Brocade (IBM) FC Aliases/Zones from Brocade FOS into Cisco NX-OS
    1. Easily achieved file-by-file with Notepad++ and some Regular Expressions (RegEx)
  2. Pre-apply this to the Active Zoneset on the Cisco N5Ks
    1. Won't do anything, but won't harm anything/go FC Active Zone until the applicable WWNs are seen on the Cisco N5K fabric
  3. Arrange an Outage Window "just in case", and plug in the IBM SAN24B to the Cisco N5K, and allow the ISL to form
  4. Ensure the Cisco Zoneset is active, and no FC Switches have Segmented
    1. Merge them with the applicable CLI command on the Brocade/Cisco if they have
  5. Party on down

Response of the Vendor TACs

Cisco are the first to come back; they're not too sure the IBM (Brocade side) will ISL with their N5K kit. Initially, I'm confused - "Surely FC is FC, like Ethernet is Ethernet, if both bits of kit speak FC, even if you've not tested the interoperability, it'll work right?". Sadly, as per the Brocade Community Forums post on "Can I connect a 300e to a Cisco Nexus 5548", the answer is no for me, because:

  1. I'm running Brocade Fabric FOS greater than 7.0.0
    1. After this point, Brocade disabled the ability to turn on so-called "interop mode", which means it can't ISL with anything other than a Brocade
    2. The lack of this means FCNS-type stuff, like ability to specify FC Aliases, will fail miserably on me (and both Cisco/IBM Fabrics already make extensive use of FC Aliases)
  2. Neither Cisco nor Brocade guarantee it will work

So back to the drawing board then; but now running with the suggestion someone made in the Forums about Access Gateway (AG) mode.

Brocade Access Gateway (AG) Mode

Access Gateway is Brocade's renaming of what everyone else calls N_Port Virtualisation (NPIV) - because, as I'm now finding, FC Vendors are aresholes and don't believe in notions like standardisation or consistent naming. Access Gateway (and NPIV for that matter) basically turns the Brocade Switch (in this case, the FC5022 Blade Switch) into a "dumb FC Hub", which has no configuration/Zoning on it, and consolidates a given number of F_Ports into 1x shared N_Port, such that the upstream Cisco N5K Switch will see multiple WWNs (Servers) as connected to 1x F_Port (rather than the normal 1x F_Port per WWN). It's better described by The SAN Guy on his Configuring a Brocade Switch for Access Gateway (AG) Mode post, but visually it does this:

undefined

 

Given that I've got enough spare FC ports on the GEMs on my Cisco N5Ks, this is a perfect opportunity to kill-off the useless IBM SAN24B Top of Rack (ToR) Switches I've got, and just cable the 4x Uplinks from each IBM FC5022 (Brocade 6547) directly into the Cisco N5K, so I end up with this:

undefined

Implementing Brocade AG to Cisco NPIV

I'll need an outage to achieve this to the Brocade (IBM) side, as after Access Gateway Mode is enabled, the Brocade forgets all it's FCNS/Config, so I'll need to do the following. There is also a very important note in the Brocade Fabric OS Administrator Guide, which basically says FC Initiator and FC Targets can't live on the same N_Port; which is something that could happen to me/has significance, as I have an IBM FlexStor V7000 Storage Array on the same Blade Chassis as IBM Flex Compute Nodes (Blade Servers) that want to access it via FC as a LUN. To overcome this, I'll need to ensure the N_Port Groupings of my Blade Backplane Ports for a given Blade Compute Node end up on differing N_Ports, or "AG Port Groupings" to those which any given V7000 Arrays end up on.

This all looks like:

  1. Cisco N5K preparation (non-disruptive)
    1. Copy-mutate-paste over the Brocade (IBM) FC Aliases and FC Zoning into the Active Zoneset on the Cisco N5K, and activate it in advance ready
    2. Enable "feature npiv" (non-disruptive, not to be confused with "feature npv" which turns the Cisco N5K into a "dumb FC Hub", and is disruptive - as it does to the Cisco side the same that Access Gateway does to an IBM/Brocade)
  2. Brocade cutover (disruptive/needs an Outage Window)
    1. Re-cable the 4x Uplinks from each IBMFC5022 -> IBM SAN24B to instead go IBMFC5022 -> Cisco N5K
      1. Use OM3/OM4 as it's 8 GFC over a short distance
      2. Cisco-side SFPs are DS-SFP-FC8G-SW
      3. IBM/Brocade-side SFPs are XBR-000147
    2. Take the FC Switch out of the FC Domain
      1. switchdisable
    3. Enable the Brocade (IBMFC5022) for Access Gateway (NPIV) Mode
      1. ag –modeenable
    4. Verify NPIV (AG) is done/running on the Brocade (IBM FC5022)
      1. ag --modeshow
    5. Show the port mappings (F_Port -> N_Port), and verify that the V7000 Blade Chassis Ports/WWNs are in differing N_Port Groups to any Blade Compute Servers
      1. ag --mapshow
      2. If they aren't (i.e. WWN from a V7000 and a Blade Compute Node mapped to same N_Port), split them out:
        1. ag --mapdel 0 "13;14"
          ag --mapadd 13 "1;2;5;6"
  3. Cisco N5K post-cutover check
    1. Check copied-over FC Zones using Brocade/IBM WWNs/Hosts are now active (have a "*" against them)
      1. show zone active
        show zoneset active
    2. Check Brocade WWNs are logged into the FLOGI Database
      1. show flogi database
  4. Hit the old IBM SAN24B repeatedly with a large lump hammer and/or baseball bat for all the pain it has caused

I've not had chance to navigate the "Politics of ITIL" (TM) yet to tell you if this is the correct way; I'll let you know.

Getting VIP and Server Farm stats from a Cisco ACE Load Balancer

Sunday, 15 Sep 2019

Getting VIP and Server Farm Stats from a Cisco ACE Load Balancer

As a continuing reminder that we don't all work in Cool Kid Hipster Service Mesh-using Companies ("Kuber-Hetes? She's that 'Seven Stages of Grief Model' author, yeah?"), some of us still work in a fairytale land of Managed Service Providers, ITIL and old kit - like the Cisco ACE30 Load Balancer. At $work, I've got four of these bad boys; two per Data Centre (I know, I know - "Psscht, all the Cool Kids do Cloudless now..."), hosted inside a Cisco Catalyst 6500-series Chassis that does little else than power and water provide backplane for these ACE30 Modules.

 undefined

An ACE what-what now?

An Application Control Engine 30 (because there were 29 before that's how Cisco number things) was Cisco's prime Load Balancing offering, right after the ACE20, and about the same time as the ACE4710 Appliance; shortly before Cisco saw the writing on the wall in the ADC market and promptly exited Load Balancing entirely, stage right. But no matter, those of us who work for companies that have been around for longer than twenty minutes will likely have encountered one of these, and as it's so old, might be thinking of moving it to something more modern, classy, and less EoL/EoS.

While we're doing that, we may as well take the opportunity to clean up all the old cruft that has built up in it over the years; or in ACE-speak, that's:

  • Unused Virtual IP (VIP) Addresses
  • Unused Server Farm (SFarm) Pools/IP Addresses
  • Unused Real Server (RServer) IP Addresses

So what better time to pretend you're a DevOps Cool Kid and break out some Scripting Foo and scrape those stats and figures automagically!

Telnet-scraping: Never as easy as you think

My first attempt was flawed because I assumed Cisco might be up to their old tricks, and provide nothing but a Telnet interface in - which I wasn't half wrong at, because as these things go in the Enterprise real-world (with a variety of MSPs and technical silos running things like Firewalls, Networks, Servers, Data Centres), you get things like:

  • Only one of the two Data Centres lets you through the Firewall to Telnet to those ACE Load Balancers
    • From certain Source IPs in one and other IPs in another
  • Nobody ever bothered to initialise the RSA Keypair, so SSH doesn't work
  • We couldn't afford the separate ACNM NMS-like Solution to monitor all these
    • Because in old Cisco-land, an NMS was just a software 1:1 extension of the product; they ain't making no money having you abe to manage it from one of the many existing Cisco-based NMS Platforms* you've already got
  • Web Browsing to it needs to be done via RDPing to a Box behind the Firewall, and then opening a browser as old as Internet Explorer 9
    • At which point you're met with a hideously basic Page that provides little more than an XML DSD Schema

* = I grew up with CiscoWorks as an NMS for everything; I quickly realised it was just a poorly cobbled-together set of IBM bits, and Java crap - and unlike it's plucky name, it rarely ever did (work).

So, you start a Telnet scraper script in PHP - easy enough, you've done this before, and have a box able to run PHP and Telnet to the ACE Load Balancers via the poorly-made Firewall net (by luck, rather than Design). Roughly three hours in, you realise that it's got some weird non-standard Telnet Control Characters everywhere, so your "Expect Scripts" (Send <Username>; Wait n seconds; Second <Password>...) aren't gonna do jack. Hmm, not good; let's go back to the drawing board - didn't it have a HTML UI again?

undefined

Get to the Code already!

It does have an HTML UI, but no obvious clue as to what you can do with it... But that DSD Schema download thing, that's XML isn't it? Why would you provide an XML Schema, unless... *Ten minutes of Googling later* AHHH! It's got an XML-based API! One where someone has been through this pain before with.

The XML-API

It's not as well documented as the newfangled HTTP-based REStful APIs, but ignoring the configuration-set based options, for "show" commands there are two styles of data retrieval:

  1. Get via a Cisco IOS-like "show" command
    1. xml_cmd=<request_raw>show context | inc Name</request_raw>
  2. Get via an element in the XML DSD hierarchy
    1. xml_cmd=<request_xml context-name="ContextName"><show_serverfarm info-level="detail"/></request_xml>

Unfortunately, because Cisco's gonna Cisco, much like how their own Business Units rarely seem to talk to each other, so too does the XML API have some inconsistencies such as, for the "detailed flags" (i.e. "show command detail-flag-here"):

  • Sometimes it might get called "info-level"
    • xml_cmd=<request_xml context-name="ContextABC><show_serverfarm info-level="detail"/></request_xml>
  • But other times it might get called "info-detail"
    • xml_cmd=<request_xml context-name="ContextABC"><show_service-policy info-type="summary"/></request_xml>

The Script

Finally, onto the script. It's coded in PHP for no other reason than I'm familiar with it; it could easily be ported to a cool kid language like Python; the concepts are transferable. You'll note from the Input and Output Filename Constants (i.e. ACE_FILE), it's designed to be run on a Windows box; note that, with PHP on Windows, you have to flip-around the File Path designators from "\" to "/"; I don't know if the same is true for other languages, such as Python, on Windows:

  • This Path
    • D:\Folder\file.txt
  • Becomes this in a PHP-on-Windows variable
    • D:/Folder/file.txt

Whereas in a *NIX distro, this would likely just be something like:

/home/script/file.txt

Script Inputs

  • CSV file of all ACE Management Details
    • Variable (Constant):ACE_FILE
    • File: ace_ip.csv
    • Type: CSV file
      • Formatted like "ace_hostname,ace_mgmt_ip,ace_user,ace_pass", i.e.:
        • loadbalancer01,10.99.0.1,nmsuser,Pasword2019

Script Outputs

  • CSV file of all VIP stats on all Contexts of all ACE Load Balancers
    • Variable (Constant): OUTPUT_FILE_VIP
    • File: ace_serverfarm_stats_<Year>-<Month>-<Day>.csv
    • Type: CSV file
      • Formatted like "load_balancer,context,name,state,address,protocol,port,curr_conns,drop_conns,hit_count", i.e.:
        • loadbalancer01,ContextABC,CM-VIPABC,OUT-SRVC,10.99.0.2,tcp,443,0,0,0
  • CSV file of all Server Farm stats on all Contexts of all ACE Load Balancers
    • Variable (Constant): OUTPUT_FILE_SF
    • File: ace_vip_stats_<Year>-<Month>-<Day>.csv
    • Type: CSV file
      • Formatted like "load_balancer,context,name,type,state,description,predictor,rserver,address,port,state,curr_conns,total_conns", i.e.:
        • loadbalancer01,ContextABC,SF-Group1,HOST,ACTIVE,"Serverfarm for ServersGroup1",ROUNDROBIN,H.10.98.0.2,10.98.0.2,80,OPERATIONAL,0,165

Script Code

<?php
# Cisco ACE Load Balancer Stats Scraper via XML-API v0.2
# Description: Scrape the Server Farm VIP Stats from all Contexts on a Cisco ACE Load Balancer
# Input: (CSV Header) ace_hostname,ace_mgmt_ip,ace_user,ace_pass
# Author: notworkd.io
# Created: 12-Sep-2019

# Define constants
# Local IP Addresses CSV File
define("ACE_FILE","D:/ace-stats/ace_ip.csv");
# Process ACE Server Farm Stats CSV File
define("OUTPUT_FILE_SF","D:/ace-stats/ace_serverfarm_stats_".date("Y-m-d").".csv");
# Process ACE VIP Stats CSV File
define("OUTPUT_FILE_VIP","D:/ace-stats/ace_vip_stats_".date("Y-m-d").".csv");

# Define variables
$i = 0;
$outputfilea_content = null;
$outputfileb_content = null;

# Main program
# Functions
# Cisco ACE Load Balancer XML-API Call to get Contexts as array
function getCiscoAceApiContexts($ace_ip, $ace_user, $ace_pass) {
 # Initiate cURL Session
 $ch = curl_init();

 # Setup cURL Options
 curl_setopt($ch, CURLOPT_USERPWD, $ace_user.":".$ace_pass);
 curl_setopt($ch, CURLOPT_URL, "http://".$ace_ip."/bin/xml_agent");
 curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
 curl_setopt($ch, CURLOPT_POST, 1);
 curl_setopt($ch, CURLOPT_POSTFIELDS, "xml_cmd=<request_raw>show context | inc Name</request_raw>");
 curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

 # Perform cURL Data Get
 $curl_data = curl_exec($ch);
 # Close cURL Session
 curl_close($ch);
 
 # Match Context name out, parse as array
 # Format: Name: Admin , Id: 0
 $api_xml = new SimpleXMLElement($curl_data);
 preg_match_all("/Name: (.*)\\s{1,}, Id(.*)/", $api_xml->exec_command->xml_show_result, $api_regex, PREG_PATTERN_ORDER);
 
 # Return each Context Name as array element
 return $api_regex[1];
}

# Cisco ACE Load Balancer XML-API Call to get Server Farms as array
function getCiscoAceApiServerFarms($ace_ip, $ace_user, $ace_pass, $ace_context, $ace_hostname) {
 # Initialise variables
 $output = null;
 
 # Initiate cURL Session
 $ch = curl_init();

 # Setup cURL Options
 curl_setopt($ch, CURLOPT_USERPWD, $ace_user.":".$ace_pass);
 curl_setopt($ch, CURLOPT_URL, "http://".$ace_ip."/bin/xml_agent");
 curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
 curl_setopt($ch, CURLOPT_POST, 1);
 curl_setopt($ch, CURLOPT_POSTFIELDS, "xml_cmd=<request_xml context-name=\"".$ace_context."\"><show_serverfarm info-level=\"detail\"/></request_xml>");
 curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

 # Perform cURL Data Get
 $curl_data = curl_exec($ch);
 # Close cURL Session
 curl_close($ch);
 
 # Match Server Farm sf_entry (parent) and Real Servers sf_rs_entry (child) attributes out, parse as array
 # Format sf_entry: name,type,sf_reals,sf_state,sf_reals_active,sf_description,sf_predictor
 # Format sf_rs_entry: sf_realserver,address,rs_port,rs_state,rs_curr_conns,rs_total_conns
 $api_xml = new SimpleXMLElement($curl_data);
 
 # Loop through each sf_entry parent element
 foreach($api_xml->exec_command->xml_show_result->xml_show_serverfarm->sf_entry as $key) {
  # Output to logfile
  echo "  Processing VIP (context,name,type,state,description,predictor): [".$ace_context."],[".$key->name."],[".$key->type."],[".trim($key->sf_state)."],[".$key->sf_description."],[".trim($key->sf_predictor)."]... Done\r\n";
  # Loop through each sf_rs_entry child element
  foreach($key->sf_rs_entry as $inner_key) {
   # Output to logfile
   echo "   Processing VIP-RealServer (rserver,address,port,state,curr_conns,total_conns): [".$inner_key->sf_realserver."],[".$inner_key->address."],[".trim($inner_key->rs_port)."],[".trim($inner_key->rs_state)."],[".trim($inner_key->rs_curr_conns)."],[".trim($inner_key->rs_total_conns)."]... Done\r\n";
   # Augment Output File return string (parent)
   $output .= $ace_hostname.",".$ace_context.",".$key->name.",".$key->type.",".trim($key->sf_state).",\"".$key->sf_description."\",".trim($key->sf_predictor);
   # Augment Output File return string (child)
   $output .= ",".$inner_key->sf_realserver.",".$inner_key->address.",".trim($inner_key->rs_port).",".trim($inner_key->rs_state).",".trim($inner_key->rs_curr_conns).",".trim($inner_key->rs_total_conns)."\r\n";
  }
 }
 
 # Return output CSV
 return $output;
}

# Cisco ACE Load Balancer XML-API Call to get VIPs from a Context as array
function getCiscoAceApiVips($ace_ip, $ace_user, $ace_pass, $ace_context, $ace_hostname) {
 # Initialise variables
 $output = null;
 $i = 0;
 
 # Initiate cURL Session
 $ch = curl_init();

 # Setup cURL Options
 curl_setopt($ch, CURLOPT_USERPWD, $ace_user.":".$ace_pass);
 curl_setopt($ch, CURLOPT_URL, "http://".$ace_ip."/bin/xml_agent");
 curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
 curl_setopt($ch, CURLOPT_POST, 1);
 curl_setopt($ch, CURLOPT_POSTFIELDS, "xml_cmd=<request_xml context-name=\"".$ace_context."\"><show_service-policy info-type=\"summary\"/></request_xml>");
 curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

 # Perform cURL Data Get
 $curl_data = curl_exec($ch);
 # Close cURL Session
 curl_close($ch);
 
 # Match VIP entry sp_class_map (parent) and Class Maps sp_loadbalance (child attributes) out, parse as array
 $api_xml = new SimpleXMLElement($curl_data);
 
 # Loop through each sp_class_map parent element
 foreach($api_xml->exec_command->xml_show_result->xml_show_service_policy->service_policy->sp_class_map as $key) {
  # Output to logfile
  echo "  Processing VIP Class Map (load_balancer,context,class_name,vip_state,vip_curr_cons,vip_drop_cons,vip_hits): [".$ace_hostname."],[".$ace_context."],[".trim($key->sp_class_name)."],[".trim($key->sp_loadbalance->sp_lb_vip_state)."],[".trim($key->sp_loadbalance->sp_curr_conns)."],[". trim($key->sp_loadbalance->sp_drop_conns)."],[".trim($key->sp_loadbalance->sp_hit_count)."]... Done\r\n"; 
  # Loop through each vip-address/etc as child element
  foreach($key->sp_loadbalance->{"vip-address"} as $inner_key) {
   # Output to logfile
   echo "   Processing VIP-Inner (vip_address,vip_proto,vip_port): [".trim($inner_key)."],[".trim($key->sp_loadbalance->{"protocol-type"}[$i])."],[".trim($key->sp_loadbalance->{"match-port"}[$i])."]... Done\r\n";
   # Augment Output File return string (parent)
   $output .= $ace_hostname.",".$ace_context.",".trim($key->sp_class_name).",".trim($key->sp_loadbalance->sp_lb_vip_state);
   # Augment Output File return string (child)
   $output .= ",".trim($inner_key).",".trim($key->sp_loadbalance->{"protocol-type"}[$i]).",".trim($key->sp_loadbalance->{"match-port"}[$i]).",".trim($key->sp_loadbalance->sp_curr_conns).",".trim($key->sp_loadbalance->sp_drop_conns).",".trim($key->sp_loadbalance->sp_hit_count)."\r\n";
   
   # Increment Line counter
   $i++;
  }
  # Zeroize the incrementer for the next loop
  $i = 0;
 }
 
 # Return output CSV
 return $output;
}

# Procedural
# Output to logfile
echo "Cisco ACE Load Balancer Stats Scraper v0.1\r\n";
echo "==========================================================================\r\n";
echo "JOB START: ".date(DATE_RFC2822)."\r\n";

# Output to logfile
echo "Opening input CSV file...\r\n";

# Iterate through CSV input file and make Telnet Call for each IP Address
if(!$fh = fopen(ACE_FILE, "r")) {
 # Output to logfile
 echo " Failed\r\n\r\n";
} else {
 # Output to logfile
 echo " Success\r\n\r\n";
 
 # Add header line to Processed Server Farm CSV file
 $outputfilea_content = "load_balancer,context,name,type,state,description,predictor,rserver,address,port,state,curr_conns,total_conns\r\n";
 # Add header line to Processed VIP CSV file
 $outputfileb_content = "load_balancer,context,name,state,address,protocol,port,curr_conns,drop_conns,hit_count\r\n";
 
 # Loop through each IP Address and Telnet Call to ACE Load Balancer
 while(($row = fgetcsv($fh, 0, ",")) !== FALSE) {
  # Increment line counter
  $i++;
  
  # Output to logfile
  echo "Processing ACE Load Balancer #".$i." ".$row[0]." (".$row[1].")\r\n";
   
  # Make Telnet call to get information about IP Address
  foreach(getCiscoAceApiContexts($row[1], $row[2], $row[3]) as $value) {
   echo " Processing ACE Context \"".$value."\"...\r\n";
   $outputfilea_content .= getCiscoAceApiServerFarms($row[1], $row[2], $row[3], $value, $row[0]);
   $outputfileb_content .= getCiscoAceApiVips($row[1], $row[2], $row[3], $value, $row[0]);
  }
 }
}

# Output to logfile for FileA
echo "\r\nIteration through all input ACE Load Balancers...\r\n Complete\r\n\r\n";
echo "Outputting Processed Server Farm Stats CSV file to ".OUTPUT_FILE_SF."...\r\n";

# Output Processed Server Farm CSV to file
if (!file_put_contents(OUTPUT_FILE_SF, $outputfilea_content)) {
 # Output to Processed Server Farm CSV file failed; output to logfile
 echo " Failed\r\n\r\n";
} else {
 # Output to Processed Server Farm CSV file successful; output to logfile
 echo " Successful\r\n\r\n";
}

# Output to logfile for FileB
echo "Outputting Processed VIP Stats CSV file to ".OUTPUT_FILE_VIP."...\r\n";

# Output Processed VIP CSV to file
if (!file_put_contents(OUTPUT_FILE_VIP, $outputfileb_content)) {
 # Output to Processed VIP CSV file failed; output to logfile
 echo " Failed\r\n\r\n";
} else {
 # Output to Processed VIP CSV file successful; output to logfile
 echo " Successful\r\n\r\n";
}

# Output to logfile
echo "JOB STOP: ".date(DATE_RFC2822);
?>

The End

There you go; if you liked (or didn't) this, or just have some suggestions, feel free to tweet me @notworkd

Network Billy - The world's greatest crap-looking SNMP Upload and Download tool

Saturday, 06 Jul 2019

Is it Billy Network Designer, Billy the Kids or CSNMP-Tools?

I've no idea; when I was first shown this lovely piece of software by a beaming-faced colleague, I thought it was a piece of malware at best, or some Work Experience's idea of a joke. Everything about this software looks awful - just look at the GeoCities-inspired installer bitmap:

undefined

I know, I know - "DON'T CLICK NEXT! IT'S DODGY, RUN, RUN OR YOUR..." - but keep with it (and let me know whether it's called "Billy Network Designer", "Billy the Kid", "CSNMP-Tools" or "Cisco SNMP Tools", because the Start Menu Icon, Desktop Icon and Windows Titlebar all differ from each other...), because I guarantee you this is the finest tool you'll ever use:

undefined

Have you been drinking toilet cleaner again?

Drinking, yes; Toilet Duck, no.

This software is fantastic for NetOps-types because of the following killer features:

  • Cisco Tools -> Syslog and Debug Console
  • Cisco Tools -> Cisco Snmp Tool

This software is horrific (for anybody) because of the following (whatever-the-opposite-of-killer-is) "features":

  • Map Mode
  • Network Design
    • One minute in, and all of Visio's line-misalignment quirks are forgiven

Don't use it for those (although it's unusable anyway). Do use it for the SNMP Tool - it's the finest pre-DevOps thing (I know, I know - you cool kids can do this with a Python wrapper and Bash Script) to JFDI a config restore or download the running config - via SNMP, not CLI.

Wait, did you say SNMP... for config files?

Yes, yes I did:

undefined

The reason this tool is the ultimate "Get Out Of Jail Free" card (or for those of you in the Field, "Get To Pub On Time" card) is because of the ability to use this to do the following:

  • Download a running-config or startup-config, via SNMP, using the RO or RW Community
  • Upload a running-config or startup-config, via SNMP, using the RW Community
  • Remotely reboot a Network Device, via SNMP, using the RW Community*

* = As long as you had the pre-thought to have the following CLI command in the active running-config:

snmp-server system-shutdown

Why this is so great

Sometimes, Cisco Network devices like to lock-up or have a memory leak. They'll continue to pass traffic (Data Plane), but their management instrumentation (Control Plane) - such as SSH or Telnet - will lock-up or die completely, leaving you up the creek without a "conf t" paddle. I've had this plenty of times, with plenty of ages of kit (older C3524-XL's; C3750-X; C2960 - probably nearly one of every non-NX-OS animal) - and when it's in some remote Campus or Branch Office somewhere, it's not always possible to get a skilled engineer/Console cable there, nor to get someone in to reboot the kit manually.

However, with Network Billy (as we've started calling it), you can quickly send an SNMP-based reboot to it; or even upload the specific delta command you wanted to apply with SSH/Telnet anyway. Other than this Control Plane lock-out situation, it can also bail you out for:

  • That time when you change the "ip domain-name" but forget to regenerate/zeroise the RSA keypair, and SSH hangs on you
  • That time when you change it to SSH Version 2, but it's crap old kit, so it goes to the unsupported SSH Version 1.99 - which no Terminal Emulator on earth seems able to connect to
  • That time when you lock yourself out of SSH and Telnet login, because you thought it'd be a good Friday to try out the TACACS+/RADIUS/AAA change you've been wanting to do for a while
  • That time you've got a Layer 2-only Switch, which only allows you to have one Management SVI, and it's time to re-IP it
  • That time when you accidentally applied an ACL to block SSH and Telnet from the WAN-side, instead of the LAN-side
    • This one needs you to get creative, and RDP to a Desktop asset behind the LAN-side of the Switch, and run Network Billy from that to the LAN-side Default Gateway/SVI

I could go on - in short, I owe a few of my mortgage payments to Network Billy, and it's mysterious Turkish creator.

Installing ESXi onto a Cisco WAVE 594 WAN Optimisation Appliance

Saturday, 09 Feb 2019

Installing ESXi on a Cisco Wide Area Virtualisation Engine Appliance

Why would you want to do this? No real reason, but we've been decommissioning some hardware, and it's pretty clear that Cisco WAVE Appliances are just a Compute Server, with some stuff like VGA Ports removed. Originally these Appliances were designed for CDN-like WanOp purposes, so they have extras like Cavium Crypto/Offload Cards onboard, and some SATA storage; so I thought I'd have a go at loading VMware ESXi Hypervisor onto them.

The box I have is a Cisco WAVE 594, with specifications as follows:

  • Processor - Intel Xeon X3430 @ 2.4 GHz
  • Memory - 8 GB DDR3 RAM
  • Storage - 2x Hot-pluggable 500 GB SATA 7.2k Hard Drives
  • Storage - 1x Internal 4 GB USB Flash Disk
  • Network* - 2x Intel 82574L 1 GbE Network Ports

* = Not detected by ESXi, even though they're on the VMware Hardware Compatibility List (HCL)

What have we got here, Captain?

Here's a few photos of what we've got to work with:

undefined

undefined

Inside, you'll notice an internal USB port, plugged into a 4 GB USB Flash Drive (by some company I've never heard of); outside, you'll notice I've plugged in a USB 3 Ethernet Adapter (that uses the Realtek RTL8152 Chipset).

Port-wise, all we have to play with is:

  • 1x External USB Port
  • 1x micro-USB Console Port
  • 1x RJ45 Console Port (Serial Port)
  • 2x RJ45 1 Gbps Network Ports

What you don't have is a VGA Port, or spare USB Port to plug a Keyboard into (as well as a USB Flash Disk for the ESXi HV/OS Volume), which will make it pretty hard to process the Next/Next/F11 sequence required to install ESXi.

Time to ask a friend

I was a bit flummoxed at this point, but handily a friend suggested that ESXi doesn't care about hardware changes after the fact - so I could stage all this by pre-installing ESXi onto the internal 4 GB USB Drive. Which is exactly what I did, so to do this, I:

  1. Created a VMware Workstation (I know, it's a work machine - I'm normally a VirtualBox man) Virtual Machine called "USB Test" on my Laptop
    1. Allocate this at least 2x vCPUs with 2x Cores
    2. Allocate this at least 4 GB RAM
  2. Followed this guide on How To USB Boot a VM in VMware Workstation 11
  3. Downloaded ESXi 6.5.0 ISO from VMware vSphere Hypervisor (ESXi) 6.5
  4. Inserted the 4 GB USB Drive
  5. Opened Rufus Bootable USB Maker
  6. Flashed VMware-VMvisor-Installer-6.5.0-4564106.x86_64.iso onto my 4 GB USB Drive
  7. Booted my "USB Test" VM, which boots the 4 GB USB Drive
  8. Followed the ESXi installation process and installed ESXi over the 4 GB USB Drive volume
  9. Rebooted the "USB Test" VM, and attached a "Host-only" Network Adapter to it
  10. Waited for ESXi to Boot, and receive a 192.168.85.x Host-only IP Address

Now I've got ESXi built onto the 4 GB USB, I need to tweak a few bits before I plug it into the Cisco WAVE 594. Using the Host-only NIC in VMware Workstation means I can locally navigate to https://192.168.85.x/ui/ on the same Laptop running VMware Workstation to jump onto ESXi vSphere and configure it ("Host-only" means it's a virtual network between just that VM and your Laptop's OS - Windows 7 for me - which sees it as a Virtual NIC).

Making it work without VGA

As well as any other ESXi settings - such as Hostname, vmk0 IP Address, Storage Volumes (although no point doing that until this is plugged into the Cisco WAVE 594 itself) - I'll need to tweak ESXi to output it's boot screen (VMware call this the Direct Console User Interface, or DCUI; I call it the "yellow and black ESXi boot screen", much catchier) somewhere other than VGA, as the WAVE 594 doesn't have a VGA Port.

Doing this is quite easy; what ends up happening is that a VGA-like output (i.e. the VMware DCUI) gets redirected to the Serial port, which in this case is the trusty old blue RJ45 Console port. To do this, follow the instructions on VMware's website Redirect the Direct Console to a Serial Port Using the vSphere Client:

  1. Login to the vSphere HTML Client (i.e. https://192.168.85.x/ui/)

  2. Click the Configuration tab

  3. Click Host, then Advanced Settings

  4. Search for parameter VMkernel.Boot.logPort

    1. Make sure it says default

  5. Search for parameter VMkernel.Boot.gdbPort

    1. Make sure it says default

  6. Search for VMkernel.Boot.tty2Port
    1. Set it to com1
  7. Click OK

Job done, now we can simply insert the USB Drive into the internal USB slot, connect our trusty blue Console Cable and USB Adapter into the Console Port, and set PuTTY or Screen to 115200 Baud rate*, and boot the Cisco WAVE, then wait for the ESXi Boot Messages and DCUI to flow...

undefined

* = If you want to see the WAVE BIOS boot messages, you'll have to set it to 9600 baud first, and then change it to 115200 when you get garbage characters on your screen output.

So close, but yet so far

Remember that asterisk note I wrote before, where VMware lie and say they support the Intel 82574L in their HCL? Well, they don't - and to save you time, they:

  • Don't in ESXi 5.5
  • Don't in ESXi 6.0
  • Don't in ESXi 6.5
  • Don't even when you mess around with custom and obsolete net1000e VIB driver packs

Now what, not much use having an ESXi Node with no Physical Networking on it! This is where the second brainwave clicks in; lets use that USB Ethernet Adapter we've got lying around! Luckily Jose Gomes has had exactly the same idea and created a lovely guide on using a USB Ethernet driver for ESXi 6.5 - so follow that. For me, this looked like:

  1. Download the Driver VIB for the Realtek USB Adapter
  2. Enable SSH Service in ESXi vSphere Web UI (the Service is called "tsm-ssh")
  3. Use FileZilla to login as "root", and copy-paste the VIB to /tmp/
  4. Follow VMware KB Article 2147650 to disable the newer USB Drivers
  5. Install the custom Realtek VIB, from SSH this command should do it:
    1. esxcli software vib install -v /tmp/r8152-2.06.0-4_esxi65.vib
  6. Reboot ESXi

Let's see what we get this time then, when we also plug our cheapo USB 3 Ethernet Adapter in to the front USB port (and ESXi 4 GB USB into the internal USB port):

undefined

Great Success!

There is a caveat here - I find that, on reboots, ESXi DCUI will uncheck the "Use vmnic32 for Management" box, so it won't be contactable from the Network/won't get a DHCP IP until you manually press F2 -> Login to DCUI -> Re-enable it, which isn't much use if it's remote and the power goes.

Apparently there's a fix for that here in Install ESXi on a server/laptop with only USB Ethernet with an aptly-named file called "weasel", but I've had stoat-all success in getting it to work, so it's a limitation I've just lived with.

As a side note, because we didn't run the interactive installer on ESXi while it was connected to the WAVE 594 Hardware, you'll need to manually use the ESXi Datastore -> Storage -> Adapter -> Delete Partition option to wipe the partitions of data on both the 2x 500 GB SATA Disks, and can then set them both up as "New Datastores", so they can be used to hold VMs as VMDK virtual hard drive files.

Here's a handy guide on How To Erase ESXi Disks With ESXi Host Client v3.

Have fun!

Home ← Older posts