After my CCIE R&S bootcamp with Brian Dennis (reviewed a few posts ago / last words), I got stuck with the idea of understanding how PFR(Performance Routing)/OER(Optimized Edge Routing) really works, prior to this, I had avoided this topic for a myriad of reasons. I have done some labbing, reading and testing and I wanted to share my view and simplified understanding of this feature, so you can use it in your study to pass the CCIE lab. I’m going to explain it on two different post, this one covering much of the theory and basic configuration and another a little bit more advanced.
In current networks, we are basically working with a very “static” decision process that is done mostly based on the amount of bandwith declared on a link. Current standard setups can’t take into account usage, delay, efficient load distribution (underutilized links), and more important, actual reachability to the destination network. This could lead to traffic blackout and also to not efficient forwarding through a link with over utilization/loss/delay.
PFR/OER is a technology designed to address these issues and basically achieve:
- Routing based on the actual behavior of the network, and not a statically defined metric.
- Minimize the risk of black-holing traffic by testing actual reachability.
- Effective load sharing and efficient utilization of network resources.
- Possibility to define specific profile’s for certain traffic type and/or applications.
General overview of the infrastructure (How it works)
PFR/OER is implemented around two entities, a Border Router (BR) and the Master Controler (MC). Both of these can co-exist on the same physical device, but as a requirement you need to have at least two external links configured in the topology (if you have one device only, this needs to have two external links minimum).
- The Border router (BR) duty is to do the actual switching of the traffic and collect netflow and IP SLA probes information for the Master Controler.
- The Master Controler (MC) duty is basically to analyze all the data that the BR(s) is/are collecting (Netflow/IP SLA), and make decisions about how the flow has to go, pushing these changes to the BR.
In short, the BR gathers statistics of the behavior of the network and sends it to the MC so it can analyze the data and make a decision. If a routing change is needed, MC will push that info to the BR so it can actually modify the routing behavior. The MC router does not directly affect the forwarding path of the traffic, BRs are the ones that do so. They become the executing hand of the changes that the MC informs to do. Let’s say for example that one BR is experiencing high delay on a specific prefix (more than 250 ms) by an IP SLA probe. So it sends that information to the MC, which will compare the info with a configured “rule”, and it will decide whether a routing change must be done. If so, MC will “tell” the BR to change the forwarding path of this specific prefix by injecting a static route (just an example) that points to another link with less delay.
Performance Routing engine:
PFR/OER works in 5 phases:
- Learning: The MC tells the BR to learn/discover the traffic classes based on destination prefix, ports, NBAR applications, DSCP, etc.
- Measuring: For the traffic classes learned, PFR/OER is going to measure their performance with netflow information and IP SLA probes (depending on the configuration). The measured values can be delay, mos, availability, jitter, etc.
- Applying Policy: With the measurements done, the MC will decide whether the traffic classes are IN POLICY based on the configured parameters, or OUT OF POLICY. If an alternate path exists, it will be also checked to confirm that meets the configured requirements.
- Enforcing: Modify the routing behavior by injecting statics routes, working with the BGP local preference, forcing policy-based routing, etc.
- Verifying: Check that the new enforced change is compatible with the defined policy.
Important things easily forgotten when configuring PFR/OER
You need to keep in mind the following details (they are not that clear in the documentation):
- Parent route: In order for the MC to be able to tell the BR to inject a static route for a specific prefix, the BR needs to be able to route using that new exit point in the first place, then, a route needs to exist in the database. It’s important to understand that this route does not need to be active in the routing table, so for example a floating static route would do the trick. Imagine that we have R1 with two external interfaces like f0/0 (10.0.124.1) as a main interface and s0/0 (10.10.10.2) as a secondary link. A valid configuration using static routes would be:
ip route 0.0.0.0 0.0.0.0 10.0.124.1
ip route 0.0.0.0 0.0.0.0 10.10.10.2 20
If you manage to forget the floating route in this kind of setup, simply it won’t work. Your setup may include BGP, OSFP, etc. not only static routes. Just be sure that the requierement is covered.
- Access-list: ACL’s are not used to match ip address, but for protocols/port information. It’s very easy to make a mistake on this. Just try to remember, why to use an ACL if you have a prefix-list to match addresses? If you need to match based on a prefix and on a protocol, use both, or better, use NBAR and a prefix-list.
- Same IP segment: If you are going to base the routing decision process on protocol information and not just on ip prefix, the BR needs to be on the same ip segment. This is because for this kind of manipulation, PBR (policy based routing) is going to be used. PBR will change the next-hop of the traffic as it’s received on the “incoming interface” and if the BR doesn’t share the segment, the packets may end looping. A possible solution, if the routers are in different subnets, is to use tunnels.
- You need at least two exit points: PFR would not make any sense if you can’t change the behavior of the network. To be able to do that, you need minimum two exit points in the setup. It does not matter if both of them are in the same device, but they must exist and be defined in the configuration. Makes sense, right?
- Version: The PFR/OER version configured on the master controller needs to be superior or equal to the one configured on the BRs. Otherwise, you may have some weird behavior.
I’m going to show a basic configuration, and on future posts, we will work in a more “advanced” one.
In this setup we are going to have 2 border routers (BR) and 1 master controller (MC) on the topology. The network has a lot of traffic flowing but we are just going to focus on the prefix 126.96.36.199/24. Let’s imagine this prefix involves very critical traffic for our network, so we want to give it the best possible treatment. All traffic going that way belongs to an interactive application, so it must go with the lowest delay possible.
We are then going to configure a policy so that this critical traffic gets moved into another link if the delay measurements is superior to 80 ms.
Define the infrastructure:
The configuration on the BR is very simple: you just have to specify the “local” interface, that basically defines which interface is going to be used as source for the connections, and a key chain to define the password that is going to be used to authenticate the messages:
It’s in the master controller where all the magic happens. For the infrastructure you need to define the key chain, and the BR of the topology. On this part you need to define which of the interfaces of the BR are going to be inside the PFR/OER domain, and its functions: internal for the inside of the network and external for outside.
After this configuration you should be able to see the BR come up, and at “ACTIVE” status:
Inside the “pfr master” configuration you can define general policies to apply to all traffic. If you define more specific ones later, this would overwrite the general ones (like we are going to see next). Example:
You can check the current configuration with the command “show pfr master”. One interesting thing about this output is that you configure the different parameters (syntax) the same way it’s displayed in this command. Example: How do you configure PFR/OER to be in observe mode?: mode route observe
Static definition of traffic and policy:
As I mentioned on the general setup overview, we have a traffic that is very critical and we need to do some manipulation for it. I’m going to explain this policy definition, as you’ll type it to configure it:
pfr-map CRITICAL-TRAFFIC 50
Name of the PFR-MAP, basically it works the same way as route-map does: you have a sequence number that defines the order in witch PFR/OER is going to look at the policy defined. The way that IOS goes through is the same as with a route-map
match traffic-class prefix-list CRITICAL
This commands define the traffic that is going to be matched in this policy definition. Here, we are only working on a specific prefix, but you can also match on ports, dscp, nbar, etc.
set periodic 90
How often PFR/OER is going to try to find a better path for this specific traffic (seconds).
set mode select-exit good
Defines how PFR/OER is going to choose the best exit. There are two possible options in here: “good”, which means that PFR/OER is going to pick an exit point that just falls under the IN POLICY needs, or “best” which means that will always try to find the best link available, even if the “current” one is offering an “IN POLICY” state.
set backoff 90 90
This timer controls how much time the master controller waits for an out of policy traffic before trying to find a better link (seconds).
set delay threshold 80
Here we define how much delay we are going to tolerate for this traffic class. I have defined a maximum of 80 ms. If PFR/OER finds that the traffic class is having a delay higher than this quantity, it declares the traffic as out of policy and will look for a better one.
set mode route control
Here we tell PFR/OER to control the actual traffic. By default, it’s just going to “observe”. Setting it to “control” means that PFR/OER is going to actually change the behaviour of the network
set mode monitor both
We are going to speak more about the monitor modes in the next post. For now, keep in mind that we are going to use Netflow information (passive) and IP SLA probes (active) to measure the performance of this traffic class.
set resolve delay priority 1 variance 10
The resolve parameter defines what PFR/OER is going to take into account to make decisions about the IN POLICY or OUT OF POLICY state of a specific class. Here we are telling that delay is our priority.
no set resolve range
no set resolve utilization
We are not going to take into account these two parameters. Basically, PFR/OER is configured to only look for delay (and unreachable state, that is the most important)
set active-probe echo 188.8.131.52
It’s going to send IP SLA probes to this destination network to measure the performance. This info plus the netflow information, will feed the master controller database so it can make decisions.
Here we apply the policy to the PFR/OER process.
The final configuration for the Master Controller will be:
You can see the results of your policy configuration
These commands help us also to see what we have actually changed for the policy compared to the default values (all values with a * prepended)
To see the current traffic classes:
In this output, you’ll see the traffic that you have statically defined (like we did) and also, if enabled, all the traffic that PFR/OER is learning (not in our case, as we are not auto learning anything).
And you can see the specific performance information of the traffic class, with the following command:
Right now, focus only on the values I have highlighted. Basically, look that we are right now inside the IN POLICY state, we are using 10.5.5.5 as our exit BR, which is offering a delay of 2 msec both on our passive (netflow stats) and our IP SLA probe.
If we increase the delay of the link connected to R5, PFR/OER will react to the change and will modify the actual path:
*Feb 29 22:49:57.111: %OER_MC-5-NOTICE: Route changed Prefix 184.108.40.206/24, BR 10.4.4.4, i/f Et0/1, Reason Delay, OOP Reason Range
As the output shows, we have switched over to 10.4.4.4 because there was too much delay for this specific class. Again, focus on the highlighted sections. Right now we are on a “HOLDDOWN” state, witch means that PFR/OER have made a recent change and it’s waiting for the network to stabilize before actually making another change (if needed). Basically we dont want changes too often (that could lead to flapping). Look also that we can see how much delay the prefix 10.5.5.5 is having (106ms).
How was the change enforced? By changing the local preference of this prefix in BGP:
Keep in mind that I did not do any special configuration on the BGP side, everything is handle automatically by PFR/OER
After a while, the traffic class would be back into a IN POLICY state:
Now, what happens if 10.4.4.4 fails? would the MC switch over to 10.5.5.5 even if that exit is not providing a good enough link?
I’m going to shut down one of the links that 10.4.4.4 uses to forward the traffic (not the one directly connected), so you can see how it reacts.
Our active probe is failing. Look at the Unreachable (passive is showing loss), if you wait a few seconds:
*Feb 29 23:06:03.031: %OER_MC-5-NOTICE: Route changed Prefix 220.127.116.11/24, BR 10.5.5.5, i/f Et0/1, Reason Unreachable, OOP Reason Timer Expired
And now the traffic flows using 10.5.5.5, even if that link does not comply to the delay configured:
This is just the basics, wait for the next post
Please, if you like our post don’t forget to hit the google +1 button at begining of the article.
The Cisco Performance Routing (PfR) / Optimized Edge Routing (OER) by CCIE Blog, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.