Revision: 36849
Initial Code
Initial URL
Initial Description
Initial Title
Initial Tags
Initial Language
at November 28, 2010 15:53 by ge01f
Initial Code
# Red Eye Monitor (REM) by Geoff Howland # # Suite Package: For use with REM Package system, which is a generic system # for creating runnable code, services, web pages and RPC handlers. # # Package information package: name: "Red Eye Monitor" version: "2010.11.22.00" # Stability: unstable (development is occurring) -> testing (development has stopped) -> accepted (>N [10 medium?] sites confirm running) -> stable (>10 days w/o new bug reports)-> proven (>90(?) days w/o new bug reports) #TODO(g): "accepted" will be judged by me initially, because there is no # user base, and also no integrated bug tracking issue tests, so all of # this is manual until automation for it arrives stability: unstable short: rem author: Geoff Howland maintainer: Geoff Howland contributors: [] info: "Comprehensive cloud automation for lazy control freaks" website: http://redeyemon.wordpress.com/ # This is a Suite package, which means it is meant to be a collection of # packages and web stuff. It is a high-level master-application type package. type: suite # This is the script that launches the State scripts, and runs the State # Evaluator if necessary #NOTE(g): This is a standard launcher, which can be used on any package. If # you want to customize it, rename it to something like "name_launcher.py" # is it's obvious you have modified it and it is not the default launcher. launcher: launcher.py # This is the State Evaluator script for this package. If the state is # left blank ("") it will be run on mounting the package. If a state script # finished running, and did not set a new state, it is run to determine # the next state. #TODO(g): Is this generic or specific? It's just going to process the states? # If so, then it is generic, and isn't needed here. The Package module # can do it all, and just RUN the script thats in the current system... # This changes things though, but making a handler custom would be a pain, # no one will want to do it... #TODO(g): Generic State Evaluator will simply set the state to "active" #DEFAULT:#state evaluator: scripts/state_evaluator.py state evaluator: rem_state_evaluator.py # Paths to use when working with this package paths: # Where our scripts like, this is a sandbox for some security #NOTE(g): I only do this for scripts, as they have an execution component, # so this will aid in sandboxing their runs #TODE(g):SECURITY: There is a lot to do here later, for now do it simply script: package/rem/package/scripts/ # data, static and other paths work off the base package path prefix #NOTE(g): This is because I think the paths to the data files look clearest # this way. Scripts are usually prefixed by something that says "script" # in the key, but data files may have lots of interesting names. The # "data/" prefix in the path adds information, that this is data request, # and not a block. Basically, I think it's hard to read. base: package/rem/package/ # Modules are not run, but they exist as named plug-ins to the package. # Using modules allows a flexible and universal method of making resource # groups for the package state handler. Use them for including groups of # scripts, or groups of data sources, or whatever your package might need. # Because they are just a data container for your scripts to use, they are # completely free for any purpose. The advantage of this is that accessing # resources is the same from any package script, and any package script can # access the resources of any other package module by: # GetPackageModule(group, name) #TODO(g): Implement: GetPackageModule(group, name) #TODO(g):SECURITY:HIGH: Add filters to allow/deny modules to outside sources, # and name specific sources for allow/deny. ACLs. Make general library, # so sharedstate/locks/counters and message queues can also use ACLs. # This is of primary concern for RPC calls, and an ACL should be able to # specify not allowing a DynamicRPC or other calls to access data. (LATER) modules: # Module Groups are used to give name spaces to the elements in the module #TODO(g):REMOVE: These are monitoring modules, but it's an example monitors: ping: remote: true script: - monitors/ping.py snmp: remote: true script: - monitors/snmp.py tcp: remote: true script: - monitors/tcp.py http: remote: true script: - monitors/http.py local: remote: false script: - monitors/local.py collector: remote: true script: - monitors/collector.py # Scripts to process the results of module scripts #NOTE(g): This will be passed along with to the Job Scheduler Suite. # This allows us to have custom processing for results from modules that # we register with the Job Scheduler to run module result processors: monitors: scripts/process/monitor_processor.py # Alternative method: # A module result specification, which defines how to deal with the results, # in the standard way that I deal with results. This is good for the # monitoring type situations. module result specifications: monitors: data/monitor/monitor_result_processor.yaml # HTTP, RPC stuff #TODO(g): Come up with a better fucking name than "communications", it sucks #NOTE(g): Is "web" much better? I liked more generic better... communication: # Will look in the static path for files referenced by relative path if they # are not caught by HTTP path matches. static: path: static/html/ # Non-User defined HTTP page entries. #NOTE(g): User defined is in: ./data/web/user_pages.yaml http: # Show internals page #TODO(g): This could be better, all the way around... show: run: - script: scripts/web_demo/show.py # Show internals page #TODO(g): Move this to the dropStar Suite Package admin: run: - script: scripts/web_demo/admin.py template: path: static/html/simple.html # Show internals page #TODO(g): Move this to the dropStar Suite Package widgets: run: - script: scripts/web_demo/widgets.py template: path: static/html/simple.html # Import all content in this YAML file as pages for this package #TODO(g): Move this to the dropStar Suite Package __load: data/web/user_pages.yaml # RPC calls: JSON data over XmlHttpRequest JS call (via web) rpc: #TODO(g): Clean up the cruft in this file, half of these arent used any more "": run: - script: scripts/web_demo/search.py Search: run: - script: scripts/web_demo/search.py # Reload specified 'widgets'. List or space delimited: widgets have no spaces #TODO(g): This one is universal? Could using specific ones to keep # context/position be a good practice? ReloadWidgets: run: - script: scripts/admin/reload_widgets.py #TODO(g): This one is universal? Could using specific ones to keep # context/position be a good practice? DynamicRPC: run: #TODO(g): This will need to use the PACKAGE name (mounted name) to # access the proper script now, since they can be dynamically # mounted anywhere... - script: scripts/dynamic/dynamic_rpc.py # Monitoring #TODO(g): Use DynamicRPC and get rid of these... MonitorHostList: run: - script: scripts/monitor_admin/host_list.py MonitorHostView: run: - script: scripts/monitor_admin/host_view.py MonitorGraphList: run: - script: scripts/monitor_admin/graph_list.py MonitorGraphView: run: - script: scripts/monitor_admin/graph_view.py MonitorGraphViewDialog: run: - script: scripts/monitor_admin/graph_view_dialog.py #TODO(g): Later this will do more things, for now it's a single purpose menu #TODO(g): Use DynamicRPC and get rid of these... MonitorManageMonitors: run: - script: scripts/monitor_admin/create_monitor_dialog.py CreateMonitor: run: - script: scripts/monitor_admin/create_monitor.py DeleteMonitorDialog: run: - script: scripts/monitor_admin/delete_monitor_dialog.py DeleteMonitorConfirmed: run: - script: scripts/monitor_admin/delete_monitor.py #TODO(g): Later this will do more things, for now it's a single purpose menu #TODO(g): Use DynamicRPC and get rid of these... MonitorManageHosts: run: - script: scripts/monitor_admin/create_host_dialog.py CreateMonitorHost: run: - script: scripts/monitor_admin/create_host.py DeleteMonitorHostDialog: run: - script: scripts/monitor_admin/delete_host_dialog.py DeleteMonitorHostConfirmed: run: - script: scripts/monitor_admin/delete_host.py CreateDashboardDialog: run: - script: scripts/monitor_admin/create_dashboard_dialog.py CreateDashboardConfirmed: run: - script: scripts/monitor_admin/create_dashboard.py AlertVisualizerView: run: - script: scripts/monitor_admin/alert_visualizer.py AlertVisualizerDialog: run: - script: scripts/monitor_admin/alert_visualizer_dialog.py # Work on many selected hosts #TODO(g): Use DynamicRPC and get rid of these... MonitorAddDefaultMonitorsToHosts: run: - script: scripts/monitor_admin/selected_hosts_add_default_monitors.py MonitorAddDefaultMonitorsToHostsDialog: run: - script: scripts/monitor_admin/selected_hosts_add_default_monitors_dialog.py MonitorDeleteSelectedHosts: run: - script: scripts/monitor_admin/selected_hosts_delete.py MonitorDeleteSelectedHostsDialog: run: - script: scripts/monitor_admin/selected_hosts_delete_dialog.py # Mother's RPC GetHost: run: - script: scripts/mother/get_host.py GetAllHosts: run: - script: scripts/mother/get_all_hosts.py #TODO(g): Move to dropStar Suite Package #TODO(g): Use DynamicRPC and get rid of these... UpdateAdmin: run: - script: scripts/web_demo/admin.py UpdateWidgets: run: - script: scripts/web_demo/widgets.py CreatePageDialog: run: - script: scripts/admin/create_page_dialog.py CreatePage: run: - script: scripts/admin/create_page.py CreateWidgetDialog: run: - script: scripts/admin/create_widget_dialog.py CreateWidget: run: - script: scripts/admin/create_widget.py CollectFieldSet: run: - script: scripts/web_demo/collect_fieldset.py ViewPageWidgets: run: - script: scripts/admin/view_page_widgets.py EditPageDialog: run: - script: scripts/admin/edit_page_dialog.py EditPageSave: run: - script: scripts/admin/create_page.py DeletePageDialog: run: - script: scripts/admin/delete_page_dialog.py DeletePageConfirmed: run: - script: scripts/admin/delete_page.py ClonePageDialog: run: - script: scripts/admin/clone_page_dialog.py ClonePage: run: - script: scripts/admin/clone_page.py EditWidgetDialog: run: - script: scripts/admin/create_widget_dialog.py EditWidgetSave: run: - script: scripts/admin/edit_widget_save.py DeleteWidgetDialog: run: - script: scripts/admin/delete_widget_dialog.py DeleteWidgetConfirmed: run: - script: scripts/admin/delete_widget.py #TODO(g): Later this may do more, for now it just creates pages... #TODO(g): Use DynamicRPC and get rid of these... ManagePages: run: - script: scripts/admin/create_page_dialog.py #TODO(g): Later this may do more, for now it just creates widgets... #TODO(g): Use DynamicRPC and get rid of these... ManageWidgets: run: - script: scripts/admin/create_widget_dialog.py # State machine for this package state machine: # Starting state for this package state: initial # These contexts will be available as the active state data, when they are set context: # Startup context initial: # Number of times this script has run or completed script run times: 0 script completed times: 0 # Script data for this context script: platform: # Cross platform run block xplat: - script: script/rem/initial.py # Active context active: # Number of times this script has run or completed script run times: 0 script completed times: 0 # Script data for this context script: platform: # Cross platform run block xplat: - script: script/rem/active.py # Shutdown context shutdown: # Number of times this script has run or completed script run times: 0 script completed times: 0 # Script data for this context script: platform: # Cross platform run block xplat: #TODO(g): Could send regional collectors information about the shutdown, # including the NEW mother to use (if this is mother), and why this # is being shut down, so they can log it, and know WTF is up... - script: script/rem/shutdown.py ## Public key of author, verify that this package was produced by the author ##TODO(g): ... #pubkey: null # ## Signed application data, SHA1 sum of things that matter... ##TODO(g): ... #signed: null #TODO(g): What to do about this thing? GLOBAL! Move it to global shared state! ## Sites that are served by this dropSTAR instance #data: # time series path: /timeseries/ # These packages should be ON this machine, but arent MOUNTED by this package #TODO(g): Can pass args into the required packages, for use in interfacing # with this package? Could automate connectors or something, ways to # intreface or modify/filter data... requires packages: {} # Mount these packages #TODO(g): Mount options, "mount as" for different package name, things # access packages by their names, so the default name is important, but a # "mount as" could provide an alternative way to use that package. # Also can provide a over-ride package handler, than the script it specifies # in the package... # Could attach additional monitors, or specify the logging target, and # shit like that... #TODO(g): LOGGING TARGETS! This is the right place to assign them, then # use them automatically in the ProcessBlock code, passed down with the # request_state information, or something. mount packages: # The Package Mounter... Really, in the package? How does THIS package # get mounted? Seems like it needs to be a ProcessBlock type featuer... # Or better yet, just IN a procbloc. MonitorPackage: {} WebStuff: {} # Jobs are different than the state machine. They can be run as cron, or # against the boolean result of a test-script (with cron), or can be # invoked over RPC, and their status/results/duration are associated with # the state machine, since they are made to operate on the state machine. jobs: # Store monitor results in the "monitor.results.store" queue #NOTE(g): This allows us to separate processing I/O from other tasks, win! monitor_storage: platform: freebsd: - script: scripts/monitor/queue_storage.py interval: 5 xplat: - script: scripts/monitor/queue_storage.py interval: 5 #TODO(g): Change this to pulling data out of the "monitor.results.analyze" queue... alert_sla_monitoring: platform: xplat: - script: scripts/monitor/alert_sla.py interval: 5 #TODO(g): Conditionally start SLA monitoring! (Test with global lock) alert_sla_outage_handler: platform: xplat: - script: scripts/monitor/alert_sla_outage.py interval: 30 #TODO(g): Move all this stuff into jobs or the state machine # ##TODO(g): Conditionally start node monitors! (Test with global lock) #run workers: # central_monitoring: # - script: scripts/monitor/node/node_monitor.py # # #TODO(g): Run workers needs to be reformatted... # #minimum: 5 # #maximum: 20 # #work: # # - script: scripts/monitor/node/node_monitor.py # # #run_simultaneous: # #TODO(g): Conditionally start web server? (Test with global lock) # # # # Should CONDITIONALLY start this, if any package mounted has http/rpc stuff # httpd: # - script: scripts/httpd/__init__.py # # # #TODO(g): Run shared_state_sync automatically if any packages have # # "load state" data... # # # Synchronize our shared state system, saving any defered writes # shared_state_sync: # - script: scripts/admin/shared_state_sync.py # # # #TODO(g): Conditionally start SLA monitoring! Uses the Job Scheduler... # # --- Only load these for the REM central server: Mother --- # Load sharedstate buckets, so that running state can have persistence #NOTE(g):Specifies the path to load the state. If a "%s" is present, then # parse the %s string for the key name, and store the contents of each # YAML file into a key in the bucket "__timeseries.ts". # If the "%s" were not present, it would simply load the contents of this # YAML file into the keys of YAML data. If the YAML data was not in # dict format, then it would store the data wholey in the key "default". #NOTE(g): When any of this data is changed, it will be automatically saved # back into the specified files, as the save files are registered against # the bucket names in "__sharedstate.save.registered" private bucket. load state: # --- SPLIT: The below stuff is for the GUI/web system, and should be in a dropStar Suite of it's own! --- # # Load up each of the page widgets, into it's page.ui.widgets key (on page) ui.page.widgets: data/web/user_page_widgets/%s.yaml # User created pages ui.pages: data/web/user_pages.yaml # HTML Templates ui.templates: data/web/html_templates.yaml # Dynamic Widgets: Used to dynamically generate page content from complex # comprehensive data descriptions ui.dynamic_widgets: data/web/dynamic_widgets/%s.yaml # --- SPLIT: GUI from REM stuff. Above is GUI and should go into a DropStar Suite of it's own! --- # # Load up each of the monitors of each host (key is host's FQDN) monitors.hosts: data/monitor/hosts.yaml # Host Groups monitors.host_groups: data/monitor/host_groups.yaml # Monitoring: Alerts: key=alert name #NOTE(g): SLAs are stored in alert['sla'] = {}. They are alert specific, # so cannot be keyed on their own name in their own file, without effort. monitors.alerts: data/monitor/alerts.yaml # Monitoring: Roles: key=role name monitors.roles: data/monitor/roles.yaml # Monitoring: Contacts: key=contact user name monitors.contacts: data/monitor/contacts.yaml # Monitoring: Silences: dict of dicts (should be list of dicts) monitors.silences: data/monitor/silences.yaml # Monitoring: Shifts: key=shift name monitors.shifts: data/monitor/shifts.yaml # Monitoring: Shift Filters: key=shift filter name monitors.shift_filters: data/monitor/shift_filters.yaml # Monitoring: Outages: dict of dicts: Active outages monitors.outages: data/monitor/outages.yaml # Monitoring: Outage Groups: dict of dicts, Active Outage Groups monitors.outage_groups: data/monitor/outage_groups.yaml # Monitoring: Outage Groups: dict of dicts, Historical Outage Groups monitors.outage_groups.history: data/monitor/outage_groups_history.yaml # Monitoring: Outages: dict of dicts: Historical outages (completed) monitors.outages.history.unack: data/monitor/outages_history_unack.yaml # Monitoring: Outages: dict of dicts: Historical outages (completed) monitors.outages.history.ack: data/monitor/outages_history_ack.yaml # Monitoring: Notifications: dict of dicts: History of emails/SMSs/etc monitors.notifications: data/monitor/alert_notifications.yaml # Monitoring: Dashboard: dict of dicts monitors.dashboard: data/monitor/dashboard.yaml # Monitoring: Globals: dict of values monitors.globals: data/monitor/globals.yaml # Keep track of when we last rendered graphs, to ensure we dont do them # all again immediately on restart #NOTE(g): If you're wondering why startup takes a lot of time, this is why. #TODO(g): Cant I remove this now? Graphing is now client side, I dont think # this is required anymore... Even if we keep them, we dont need to save it #__timeseries.last_rendered: data/monitor/timeseries_last_rendered.yaml #TODO(g): I kind of like this specification model, where I could give it a # full name, or use the path and key, but I have to explicitly save it. # This way there can be 10000s of locks in the system, but only the specified # ones are stored, and are done so by package that uses them. #NODE(g): Use namespaces with dot separation to ensure package separation. # Not stomping on your data is up to you! #TODO(g): Add test with warning/error if packages try to use the same variables # in their usage. #load locks: # monitors.lock01: data/monitor/locks/%s.yaml # monitors.lock02: data/monitor/locks/%s.yaml # monitors.lock03: data/monitor/locks/%s.yaml #TODO(g): Remove this one the individual lock method is done, this doesnt work # in a packaged environment... Leaving temporarily only for design history. #######TODO(g): Stored locks all the locks? #######TODO(g): Does it ever make sense to store SOME locks? SOME counters? ####### It seems to me they shouldnt be that plentiful, and they, together, are ####### the shared state. Clear them if you want, but save them all for peace ####### of mind that they can be trusted to come back. #######TODO(g): Figure out the delays on saves, ESPECIALLY for counters, which ####### by their definition are meant to change in a frequent manner, linear ####### or exponential to requests #######load locks: data/monitor/locks.yaml # Load the stored counters #NOTE(g): Being listed here, they will automatically be registered to be # save, without any delays, so these files should always be accurate in # terms of the latest counter values #TODO(g):DESIGN: Switch this to just specify a directory? Then the counters # can be specified in one line. This way is just more typing... # Per file has the benefit of them working without any starting files, # and also not saving counters that are used in the code, but not specified # to be saved. The method of one entry per counter is more explicit... load counters: #TODO(g): Should these be moved into the controllers for this data? # Maybe the rest of them should be too... monitors.outages: data/monitor/counters/monitors.outages monitors.outage_groups: data/monitor/counters/monitors.outage_groups monitors.notifications: data/monitor/counters/monitors.notifications
Initial URL
Initial Description
With comments, and not stripped, version in development. Not Python, YAML.
Initial Title
REM Package Example
Initial Tags
Initial Language
Python