System Administrator

  • EGoli
  • Frostbyte Digital
Job Overview: We are looking for a Systems Administrator that will be responsible for managing and maintaining cloud-based infrastructure on Amazon Web Services (AWS). This role involves ensuring the reliability, performance, and security of AWS environments, as well as providing support and optimization for the cloud resources. The ideal candidate will have a strong background in AWS services, cloud architecture, and systems administration. Roles and Responsibilities: • Support and maintain IT Infrastructure • Support BI tools and processes, including data extraction, transformation, and loading. • Write queries to pull reports off Amazon Aurora. • Design, deploy, and manage AWS infrastructure components. • Monitor and maintain cloud infrastructure for optimal performance, availability, and scalability. • Implement and manage AWS cloud resources using infrastructure-as-code tools such as CloudFormation or Terraform. • Ensure cloud infrastructure security through proper configuration of security groups, network ACLs, IAM roles, and policies. • Develop and maintain scripts and automation tools for routine tasks and deployment processes using Python, Bash, or PowerShell. • Optimize cloud infrastructure costs through effective resource management and utilization. • Troubleshoot and resolve issues related to cloud infrastructure, including performance bottlenecks and failures. • Implement and manage backup and disaster recovery solutions to ensure data integrity and availability. • Conduct regular tests of backup and disaster recovery procedures. • Maintain detailed documentation of cloud infrastructure, configurations, and procedures. • Provide technical support and guidance to development teams regarding AWS best practices and resource usage. • Track system uptime and availability and promote incremental increases to change velocity • Drive innovation, prioritization and engineering of new cloud capabilities to bolster the operating model • Deploy, monitor and support mission-critical cloud-based applications within SLA. • Participate in on-call support, ensuring stability and performance of production environments. • Respond to monitoring alerts according to defined playbooks and procedures. • Participate in Post Incident Reviews and discussions. • Build automation to prevent problem recurrence; eventually automate response to all non- exceptional service conditions. • Be a subject matter expert in reducing and resolving production incidents by identifying preventive controls and driving proactive efforts. • Implement and continuously refine high-quality processes and standards for infrastructure and technical operations that support agile practices and a fast-paced environment where customer needs are dynamic and evolving. • Develop and maintain relevant IT Infrastructure policies, procedures and governance • Generate and maintain all service and product documentation i.e. SOP, Change Control. • Managing existing and deploying new system configurations, including Windows Server, and Virtualization. • Support and implement security tools, policies, and procedures in conjunction with the company’s security team. • Maintain and support patch management. • Produce daily and monthly operational reports. • Produce incident reports. • Produce trending and correlation reports. • Produce Root Cause Analysis of incidents Qualifications • B.Sc. (Computer Engineering, and or Computer Science) or similar qualification • AWS Certification Experience • Minimum of 5 years of experience working with AWS services and infrastructure. • Strong knowledge and hands-on experience with AWS services (EC2, S3, RDS, Lambda, VPC, IAM, etc.). • Strong experience with MySQL and Amazon Aurora. • Experience with CI/CD pipelines and tools (e.g., Jenkins, GitLab CI/CD). • Knowledge of containerization technologies such as Docker and Kubernetes. • Understanding of networking concepts and configurations within AWS. • Familiarity with database management and optimization in AWS. • Experience with monitoring, logging, and alerting tools (CloudWatch, AWS Config, etc.). • Proficiency in monitoring, logging, and metrics tools to effectively monitor the health and performance of AWS resources. • Excellent written and verbal communication skills.