brand logo
View all jobs

Sr Lead/Architect - Site Reliability Engineer

Chennai, Bangalore, Hyderabad
Job Description
Tiger  Analytics is a global AI and analytics consulting firm. With data and   technology at the core of our solutions, 
our  3900+ tribe is solving problems that eventually impact the lives of   millions globally. Our culture is modelled around expertise and respect with a team-first mindset. Headquartered in Silicon Valley, you’ll find our   delivery centres across the globe and offices in multiple cities across   India, the US, UK, Canada, and Singapore, including a substantial remote   global workforce.
We’re Great   Place to Work-Certified™. Working at Tiger Analytics, you’ll be at the heart of an AI revolution. You’ll work with teams that push the boundaries of   what is possible and build solutions that energize and inspire.

About the role
· Analyse   existing, create and maintain new Service Level Objectives.
· Troubleshoot,   evaluate, and resolve operational challenges contributing to defined SLOs.
· Define,   improve, and engage in adapting architectural application bottlenecks as   observed in the landscape.
· Work   with other engineering stakeholders on resolving larger architectural   bottlenecks.
· Work   in close collaboration with software development teams to consult on scaling   concerns.
· Contribute   to the future roadmap of software development teams and establish strong   operational readiness across teams.
· Scale   systems through automation, improving change velocity and reliability.
· Leverage   technical skills to partner with team members and be comfortable diving into   a problem as needed.
· Work   to enable other teams to scale through automation, knowledge-sharing, and   self-service activities.
· Automating   every operational task is a core requirement for this role. For example,   package updates, configuration changes across all environments, creating   tools for automatic provisioning of user facing services, etc.
· Responding   to platform emergencies, alerts, and escalations from Customer Support.
· Ensure   systems exist to manage software life cycles (e.g. Operating Systems) with a   minimum of manual effort.
· Develop   a fully automated multi-environment observability stack based on available   tools sets in the landscape and extend it to predict capacity needs based on   the usage patterns.
· Plan  for new service rollouts, expansion and capacity management of existing   services, and work with users to optimise their resource consumption.
· Establish   clear ongoing cloud efficiency metrics, highlighting both how we should   measure success and identifying methods to achieve those improved results.
· Implement   tools, practise, and process to enable other teams to contribute to efficiency   in their areas.
· Plan   and implement needed changes in cloud environments to drive better   observability of usage and improved efficiency.

Desired Skills and Experience
· Configuration   management: use Chef and Ansible to effectively manage our infrastructure.
· Infrastructure   as code: use Terraform and Azure DevOps CI/CD for automation, containerize   our environments (Kubernetes), and leverage cloud technologies to meet our   goals.
· Systems:   manage, configure, and troubleshoot operating system issues, storage (block   and object), networking, Security, Load balancer, Azure Defender, Application   Gateway.
· Monitoring and   instrumentation: implement metrics in Prometheus, Grafana, log management and   related system, and Slack/PagerDuty integrations.
· Engineering practices:   availability, reliability, and scalability, as well as disaster recovery
· Work in a   variety of languages: Shell, Ruby, GoLang, Python
· Advanced   knowledge of cloud services
· Kubernetes:   cluster provisioning and new services, troubleshooting
· Prometheus,   Thanos, and Grafana: service catalog metrics and recording rules for alerts.
· Log shipping   pipelines and incident debugging visualizations
· Operating   system (Linux) configuration, package management, startup, and   troubleshooting
· Block and   object storage configuration and debugging
· Terraform   syntax and Azure DevOps CI/CD configuration, pipelines, jobs.

Perks of working with Tiger Analytics
· Latest   Technology:  You will get to work on the most advanced technologies like   machine learning and artificial intelligence.
· Global   Exposure: We work for leading global brands. You will get exposure to   global markets while working with international clients.
· Learning &   Development: We partner with a host of the biggest learning platforms. You   will be encouraged to learn and grow.
· Growth Mindset:  We encourage a growth mindset and believe that learning never stops. There is   no pressure on you to be a master of everything. With this attitude, we can   make significant progress in areas that we know very little about today.   Here, you’ll work with people with a collective passion to explore the   limitless potential of data to solve complex problems.
· Additional Benefits:  Health insurance (self & family), virtual wellness platform, fun, and   knowledge communities.
We believe in equal opportunities for all and invite you to come join   us as we build the world’s best AI and advanced analytics team. Our   compensation packages are competitive and among the best in the industry.   Your designation will be commensurate with expertise and experience.