Site Reliability Engineer
DALLAS-75202, TX, US
09/30/2019
-
Required Skills
Company
Infinity Consulting Solutions, Inc
Experience
2 to 5 Year(s)
Job Description
SITE RELIABILITY ENGINEER
North Dallas/Addison, TX
Direct Hire
ICS is immediately hiring for a Site Reliability Engineer, whose mission will be to identify problems before they happen, bring order to the chaos, empower your coworkers to help themselves and improve the quality of life overall for your peers.
The Site Reliability Engineer will have a heavy emphasis upon the database tier but also application and web services tiers. In general, the SRE will be responsible for availability, latency, performance, efficiency, change management, monitoring, and emergency response.
What You'll Do:
Regularly evaluate hardware usage for Production instances of SW products with the goal to find areas of possible improvement or possible stress points that might become bottlenecks. (Active Resourcing Analysis)
Database Stress Points (high load queries, hardware load, etc.)
Storage Usage
Server Load Monitoring
Develop and maintain health monitoring system for all clients (Active Health Monitoring)
Monitor error rates by day and build out trend lines
Establish trigger points to determine Severity 1/Severity 2 early warning system to reduce net outage time.
The goal is to be notified of an outage prior to it being detected by Client Success/Support groups and perhaps even before Clients are aware of it.
Establish trigger point to provide early warning integration failures.
Develop list of operational metrics for system and build out trend lines over time (Operational metrics and trends)
Determine average load per client and estimated capacity per client
Determine average business facing operational times (ex: Time to Submit, Time to Accept, average Job
Processing throughput, average Job Processing by Job type)
Determine known stressors by client (this includes but is not limited to: known periods of high traffic, known type of activity that stresses system)
Collaborate with Development to map trends in tickets to Development projects aimed at addressing root causes.
Build out list of active client Reports, intended usage, expected delivery schedule, importance to client business process and current configuration…including but not limited to: system executing on, methodology for reporting and compatibility concerns (Client Report Management)
Skills/Experience You'll Need:
5+ years of overall, enterprise IT experience
Hands on experience with T-SQL and MS SQL Server
Advanced Knowledge of Operational Best Practices for Mission Critical Software:
Risk Assessment
Migration Plans
Critical Situation Evaluation and Resolution
2+ years of experience with:
Windows Server Environments
Enterprise Networking
JScript
PowerShell
AWS or Azure cloud experience (Azure preferred)
Experience with the following is highly preferred:
Knowledge of .NET Framework and Languages (C#, VB.NET)
Knowledge of monitoring tools
SolarWinds Ignite
Datadog
Knowledge of Web Development Practices and Technologies
MVC
ASP
JavaScript
Hang Fire
MailBee
ElasticSearch
RabbitMQ
Advanced Knowledge of Microsoft Hosting Technologies
Windows Server
IIS
MS SQL Server
Windows Workflow Foundation (WF)
Others
Information Technology
No Preference
FullTime Job
Other
1
Candidate Requirements
-
Bachelors
Walkin Information
-
9/26/2019
-
Recruiter Details
Doug Klares
1350 Broadway, Suite 2205,
NEW YORK-10018, NY
-