Job Details

Site Reliability Engineer

DALLAS-75202, TX, US
09/30/2019

-


Required Skills

    VB.NET, JScript
Company

Infinity Consulting Solutions, Inc

Experience

2 to 5 Year(s)

Job Description

SITE RELIABILITY ENGINEER

North Dallas/Addison, TX

Direct Hire

ICS is immediately hiring for a Site Reliability Engineer, whose mission will be to identify problems before they happen, bring order to the chaos, empower your coworkers to help themselves and improve the quality of life overall for your peers.

The Site Reliability Engineer will have a heavy emphasis upon the database tier but also application and web services tiers. In general, the SRE will be responsible for availability, latency, performance, efficiency, change management, monitoring, and emergency response.

What You'll Do:

Regularly evaluate hardware usage for Production instances of SW products with the goal to find areas of possible improvement or possible stress points that might become bottlenecks. (Active Resourcing Analysis)

Database Stress Points (high load queries, hardware load, etc.)

Storage Usage

Server Load Monitoring

Develop and maintain health monitoring system for all clients (Active Health Monitoring)

Monitor error rates by day and build out trend lines

Establish trigger points to determine Severity 1/Severity 2 early warning system to reduce net outage time.

The goal is to be notified of an outage prior to it being detected by Client Success/Support groups and perhaps even before Clients are aware of it.

Establish trigger point to provide early warning integration failures.

Develop list of operational metrics for system and build out trend lines over time (Operational metrics and trends)

Determine average load per client and estimated capacity per client

Determine average business facing operational times (ex: Time to Submit, Time to Accept, average Job
Processing throughput, average Job Processing by Job type)

Determine known stressors by client (this includes but is not limited to: known periods of high traffic, known type of activity that stresses system)

Collaborate with Development to map trends in tickets to Development projects aimed at addressing root causes.

Build out list of active client Reports, intended usage, expected delivery schedule, importance to client business process and current configuration…including but not limited to: system executing on, methodology for reporting and compatibility concerns (Client Report Management)

Skills/Experience You'll Need:

5+ years of overall, enterprise IT experience

Hands on experience with T-SQL and MS SQL Server

Advanced Knowledge of Operational Best Practices for Mission Critical Software:

Risk Assessment

Migration Plans

Critical Situation Evaluation and Resolution

2+ years of experience with:

Windows Server Environments

Enterprise Networking

JScript

PowerShell

AWS or Azure cloud experience (Azure preferred)

Experience with the following is highly preferred:

Knowledge of .NET Framework and Languages (C#, VB.NET)

Knowledge of monitoring tools

SolarWinds Ignite

Datadog

Knowledge of Web Development Practices and Technologies

MVC

ASP

JavaScript

Hang Fire

MailBee

ElasticSearch

RabbitMQ

Advanced Knowledge of Microsoft Hosting Technologies

Windows Server

IIS

MS SQL Server

Windows Workflow Foundation (WF)




Others
Information Technology

No Preference
FullTime Job
Other
1

Candidate Requirements
-
Bachelors

Walkin Information
-
9/26/2019
-

Recruiter Details
Doug Klares
1350 Broadway, Suite 2205, NEW YORK-10018, NY
-