Ansible: Improving Performance

User1234 · ‎2022-02-07

I am having a playbook where I define nearly all objects and rules of the standard policy/access layer. Therefore I add hosts using the cp_mgmt_host module and loop it with a yml file with all the hosts in it and I do the same thing with networks, network groups, tcp/udp, access-roles and rules.

I have defined about 500 objects and 130 rules and it takes me more than an hour to commit the stuff to the checkpoint mgmt server (cpu; ram; disk; etc. is nearly idleing). It is not published yet as auto publish is disabled, therefore all objects and rules are published at the end of the playbook (only publish if everything in the ruleset works).

I have read about the batch flag in the mgmt api (https://community.checkpoint.com/t5/API-CLI-Discussion/API-performance-optimization/td-p/3003) which seems to solve this problem; and I guess the newly published module cp_mgmt_access_rules (instead of cp_mgmt_access_rule) implements this.

Is this right? If so are there plans to implement it on other modules too? Is there anything I can do to speed things up?

EDIT: Just tried to do stuff over a server, instead of a notebook which extremely speed things up. Time usage went from an hour to max 6 minutes. This seems to be a client (CPU/RAM/Network) issue.

PhoneBoy · ‎2022-02-09

What precise version of the management server are you executing this against?
In general, the issue is the large number of changes you are attempting to commit at once.
The batch flag (which I know exists for the various basic object types) does mitigate this somewhat.
In the meantime, breaking the playbook down to smaller ones with a commit at the end of each one should help with performance.
Also, upgrading the management to R81.10 should help as well.

I'm not seeing the batch flag in cp_mgmt_access_rules docs, so not sure we've implemented it there.
Adding @chkp-royl to comment.

User1234 · ‎2022-02-10

It is R80.10 T20, so quite recent.

Thanks for the hint about publishing. I may work (not confirmed yet), but unfortunately does not display our workflow. Also, as it is used in a kind of git ops, the current ruleset and object database is maintained via git. Therefore when only changing one object, all objects get queried to the manager and checked if they are in the desired state. When nothing is changed, nothing needs to be published. Still taking a loooong time.

PhoneBoy · ‎2022-02-16

You sure we're talking about R80.10?
That's End of Support.

Art_Zalenekas · ‎2022-02-16

I understand you are talking about R81.10.
Could you please be more specific, with an example, what takes so long? We might be able to help you out, but we need to know the flow and what is being called. As for Ansible, are you using AWX, Tower, or just ansible executable? Also, what is your ansible.cfg look like?

User1234 · ‎2022-02-17

Sorry, I misstyped. It is R81.10 ofc. It is just an ansible executable, no tower awx whatsoever.

I am doing the following:

- name: set objects and layers
  block:

    - name: set policy/package
      check_point.mgmt.cp_mgmt_package:
        name: "{{ role_var_checkpoint_mgmt.default_policy }}"
        access: true
        auto_publish_session: yes
        installation_targets: "{{ host_var_checkpoint_mgmt.default_gw }}"

    - name: set access layer
      check_point.mgmt.cp_mgmt_access_layer:
        name: "{{ role_var_checkpoint_mgmt.default_access_layer }}"
        applications_and_url_filtering: yes
        content_awareness: yes
        firewall: yes
        implicit_cleanup_action: drop
        auto_publish_session: yes

    - name: set hosts
      check_point.mgmt.cp_mgmt_host:
        name: "{{ item.Name }}"
        ip_address: "{{ item.IPv4_address }}"
        comments: "{{ item.Comments }}"
        color: "{{ item.Color|default(role_var_checkpoint_mgmt.default_color) }}"
      with_items: "{{ cp_hosts }}"
      notify: set session
   
    - name: set networks
      check_point.mgmt.cp_mgmt_network:
        name: "{{ item.Name }}"
        color: "{{ item.Color|default(role_var_checkpoint_mgmt.default_color) }}"
        comments: "{{ item.Comments }}"
        subnet_mask: "{{ item.Mask }}"
        subnet4: "{{ item.IPv4_address }}"
      with_items: "{{ cp_nets }}"
      notify: set session

      ## more tasks about objects (groups, access-roles, services, etc.) and rules

  rescue:
    - name: discard any unpublished changes
      check_point.mgmt.cp_mgmt_discard:

I am setting up my policy and layer, looping through all my objects and rules and if anything fails I discard anything. If there are no errors the handlers set a session, publish and install. Take everything or nothing, so no half finished/wrongly set policy is getting installed, therefore publishing every $ objects is not an option. The publishing and installing works fine, the part about looping over the objects does take a lot of time (up to 10 secs per object). My object and rule "database", the config file does not only represent changes to be made but includes the whole ruleset, so the playbook is intended to identify differences between my config file and the actual installed/published policy (which works fine, but slow). So therefore I always have the whole ruleset saved in a git which is great for disaster recovery. This procedure also means that often only a few changes are made (i.e.: 4-5) but the whole ruleset needs to be parsed. The "publish every $ objects" would not work here, as there are only 4-5 objects to publish.

ansible.cfg is quite simple:

[defaults]
roles_path = ./roles
ansible_managed = NOTICE: This file was deployed automatically, manual changes will be lost when re-deploying.

# show "[CHECK MODE]" for every task
check_mode_markers = yes

# do not display tasks that did not change anything
display_ok_hosts = no
display_skipped_hosts = no

[ssh_connection]
pipelining = True

Art_Zalenekas · ‎2022-02-17

OK, I understand. There is one bug that has been submitted, which might be the culprit in the object comparison.
https://github.com/CheckPointSW/CheckPointAnsibleMgmtCollection/issues/53

Once that bug is fixed, your code should run faster.
On a side note, try to go away anything with_* as it's obsolete for some time and RH will move that out entirely one day. Also, loop seems to work faster (take it with a grain of salt) and is a lot more flexible.
https://docs.ansible.com/ansible/latest/user_guide/playbooks_loops.html#comparing-loop-and-with

Additionally, make sure you disable gather_facts in the ansible.cfg and run callback_whitelist with timer and profile_task. That way you will have your baseline. Also, see if you can do async in some of the tasks. I have to test it myself, as this is httpapi plugin in use.
https://docs.ansible.com/ansible/2.4/playbooks_async.html

Fast Caching can also help, but again, don't know how much this will help, as we are dealing with one host and httpapi plugin.
https://www.linkedin.com/pulse/how-speed-up-ansible-playbooks-drastically-lionel-gurret/

My best guess is wait when the bug is going to be fixed, and that might help with the comparison speed.

Art_Zalenekas · ‎2022-02-09

The new module cp_mgmt_access_rules does not execute in batch, but rather makes and creates proper order of the rules and layers. Prior the rules creation, make sure you create your layers first.

As for the performance gains, you are doing it right buy using one task and looping over it with variables of each rule.
For the commit/publish side, I would strongly suggest you do a publish every 100 items (any objects in general).

In my loop I added loop_control for label and index_var:

any_errors_fatal: true
loop: "{{ add_rules }}"
loop_control:
  label: "{{ item.name }}"
  index_var: index

Then I am using some mod based logic on the auto_publish_session to publish changes in batches:

auto_publish_session: "{{ true if (index + 1) % 50 == 0 else false }}"

This way you will have a quicker publish of all the changes. The mod of 50 is actually 100 item changes, as a rule will have 2 object changes. You can see that when you make a change to one rule. Feel free to modify as you see fit for your environment.

Here is my entire task call that comes from another include_tasks of the parent playbook:

- name: Loading Rules data variable
  include_vars: rules/rules_data.yml
- name: Add Rules and Layers
  cp_mgmt_access_rule:
    layer: "{{ item.layer }}"
    position: "{{ item.position }}"
    name: "{{ item.name | default() }}"
    action: "{{ item.action | default('Drop') }}"
    action_settings: "{{ item.action_settings | default(omit) }}"
    content: "{{ item.content | default([]) }}"
    content_direction: "{{ item.content_direction | default() }}"
    content_negate: "{{ item.content_negate | default(false) }}"
    custom_fields: "{{ item.custom_fields | default(omit) }}"
    destination: "{{ item.destination | default([]) }}"
    destination_negate: "{{ item.destination_negate | default(false) }}"
    enabled: "{{ item.enabled | default(false) }}"
    inline_layer: "{{ item.inline_layer | default(omit) }}"
    install_on: "{{ item.install_on | default([]) }}"
    service: "{{ item.service | default([]) }}"
    service_negate: "{{ item.service_negate | default(false) }}"
    source: "{{ item.source | default([]) }}"
    source_negate: "{{ item.source_negate | default(false) }}"
    time: "{{ item.time | default(omit) }}"
    track: "{{ item.track | default(omit) }}"
    user_check: "{{ item.user_check | default(omit) }}"
    vpn: "{{ item.vpn | default('Any') }}"
    comments: "{{ item.comments | default() }}"
    ignore_warnings: "{{ item.ignore_warnings | default(false) }}"
    ignore_errors: "{{ item.ignore_errors | default(false) }}"
    auto_publish_session: "{{ true if (index + 1) % 25 == 0 else false }}"
  any_errors_fatal: true
  loop: "{{ add_rules }}"
  loop_control:
    label: "{{ item.name }}"
    index_var: index
  notify: Publish Handler

Good luck!

User1234 · ‎2022-02-10

So the only difference between the two access_rule modules is the thing about the layer? There is no performance gain whether to send rule by rule, or just send all rules at once? As for the publishing, same thing applies as answered to PhoneBoy. But thanks for the rule task. It's nice to see an example better than the simple ones in the documentation

Art_Zalenekas · ‎2022-02-10

I think we are talking about two different things. 1) The difference between the Ansible modules access_rule vs. access_rules, should be minimal (I did not test but you can look at the code of the collection). 2) API calls of/for batches will definitely help on the performance.
The auto publishing of 100 items will help a lot as well. Let us know if you have any other questions or concerns. Good luck!

Are you a member of CheckMates?

Ansible: Improving Performance