International Journal of Computer Networks and Applications (IJCNA)

Published By EverScience Publications

ISSN : 2395-0455

International Journal of Computer Networks and Applications (IJCNA)

International Journal of Computer Networks and Applications (IJCNA)

Published By EverScience Publications

ISSN : 2395-0455

Design of an Integrated Model Using Hybrid Autoencoder and LSTM for Fault Tolerance and Load Balancing in Cloud Environments

Author NameAuthor Details

Nahita Pathania, Balraj Singh

Nahita Pathania[1]

Balraj Singh[2]

[1]School of Computer Science and Engineering, Lovely Professional University, Phagwara, India.

[2]School of Computer Science and Engineering, Lovely Professional University, Phagwara, India.

Abstract

Large and complex topologies in modern cloud environments really call for factors such as fault tolerance and efficient usage of resources. Current fault detection and load balancing techniques are often found to be insufficient due to known limitations of very high false positives, late detection, and great redundancy overheads that often-become bottlenecks for performance. To this effect, this work offers a new hybrid fault-tolerant load-balancing framework with an integration of multiple advanced techniques as follows: Hybrid Autoencoder-Based Anomaly Detection (HAAD), Task-Level Replication Using Intelligent Redundancy Allocation (TRA-IRA) and Long Short-Term Memory (LSTM) networks for proactive failure prediction operations. HAAD discovers known and unknown faults by learning to discern the normal behavior of a system using unsupervised autoencoders, which has achieved 97-98 percent accuracy in fault detection. TRA-IRA dynamically allocates redundant replicas based on task priority and real-time resource health predictions, reducing replication overhead by 20% while maintaining a task completion rate of 99.5%. The LSTM network predicts imminent failures by analysing temporal patterns in system metrics that enable task migration up to 45 min before with 95-96% prediction accuracy. All these techniques are easily integrable with Adaptive Resource Reallocation via Genetic Algorithm (ARR-GA) with respect to optimal scheduling. The Batfly Algorithm is used in an attempt to manage the task. Therefore, due to the integration of these approaches, it presents very efficient performance by increasing by 45% the fault tolerance strength and enhancing the reliability of a system by 50%. The response timestamp along with makespan reduced between 15 to 20%. This model will offer a scalable, dynamic, and robust method of cloud load balancing to augment critical gaps in fault tolerance and optimizations of resources.

Index Terms

Fault Tolerance

Autoencoder

LSTM Networks

Load Balancing

Redundancy Allocation

Scenarios

Reference

  1. 1.
    B. K. Ray, A. Saha, S. Khatua and S. Roy, "Proactive Fault-Tolerance Technique to Enhance Reliability of Cloud Service in Cloud Federation Environment," in I’E’ Transactions on Cloud Computing, vol. 10, no. 2, pp. 957-971, 1 April-June 2022, doi: 10.1109/TCC.2020.2968522.
  2. 2.
    A. U. Rehman, R. L. Aguiar and J. P. Barraca, "Fault-Tolerance in the Scope of Cloud Computing," in I’E’ Access, vol. 10, pp. 63422-63441, 2022, doi: 10.1109/ACCESS.2022.3182211.
  3. 3.
    T. M. Tawfeeg et al., "Cloud Dynamic Load Balancing and Reactive Fault Tolerance Techniques: A Systematic Literature Review (SLR)," in I’E’ Access, vol. 10, pp. 71853-71873, 2022, doi: 10.1109/ACCESS.2022.3188645.
  4. 4.
    C. K. Dehury, P. K. Sahoo and B. Veeravalli, "RRFT: A Rank-Based Resource Aware Fault Tolerant Strategy for Cloud Platforms," in I’E’ Transactions on Cloud Computing, vol. 11, no. 2, pp. 1257-1272, 1 April-June 2023, doi: 10.1109/TCC.2021.3126677.
  5. 5.
    J. Ramesh, Z. Solatidehkordi, K. El-Fakih and R. Aburukba, "Minimizing Virtual Machine Live Migration Latency for Proactive Fault Tolerance Using an ILP Model with Hybrid Genetic and Simulated Annealing Algorithms," in I’E’ Access, vol. 12, pp. 107232-107246, 2024, doi: 10.1109/ACCESS.2024.3438358.
  6. 6.
    D. Saxena, I. Gupta, A. K. Singh and C. -N. Lee, "A Fault Tolerant Elastic Resource Management Framework Toward High Availability of Cloud Services," in I’E’ Transactions on Network and Service Management, vol. 19, no. 3, pp. 3048-3061, Sept. 2022, doi: 10.1109/TNSM.2022.3170379.
  7. 7.
    S. Umar Mushtaq, S. Sheikh and S. M. Idrees, "Next-Gen Cloud Efficiency: Fault-Tolerant Task Scheduling With Neighboring Reservations for Improved Resource Utilization," in I’E’ Access, vol. 12, pp. 75920-75940, 2024, doi: 10.1109/ACCESS.2024.3404643.
  8. 8.
    M. Mudassar, Y. Zhai and L. Lejian, "Adaptive Fault-Tolerant Strategy for Latency-Aware IoT Application Executing in Edge Computing Environment," in I’E’ Internet of Things Journal, vol. 9, no. 15, pp. 13250-13262, 1 Aug.1, 2022, doi: 10.1109/JIOT.2022.3144026.
  9. 9.
    J. Chen et al., "Fault Tolerance Oriented SFC Optimization in SDN/NFV-Enabled Cloud Environment Based on Deep Reinforcement Learning," in I’E’ Transactions on Cloud Computing, vol. 12, no. 1, pp. 200-218, Jan.-March 2024, doi: 10.1109/TCC.2024.3357061.
  10. 10.
    G. Jing, Y. Zou, D. Yu, C. Luo and X. Cheng, "Efficient Fault-Tolerant Consensus for Collaborative Services in Edge Computing," in I’E’ Transactions on Computers, vol. 72, no. 8, pp. 2139-2150, 1 Aug. 2023, doi: 10.1109/TC.2023.3238138.
  11. 11.
    X. Tang, "Reliability-Aware Cost-Efficient Scientific Workflows Scheduling Strategy on Multi-Cloud Systems," in I’E’ Transactions on Cloud Computing, vol. 10, no. 4, pp. 2909-2919, 1 Oct.-Dec. 2022, doi: 10.1109/TCC.2021.3057422.
  12. 12.
    M. Zhao, W. Liu and K. He, "Research on Data Security Model of Environmental Monitoring Based on Blockchain," in I’E’ Access, vol. 10, pp. 120168-120180, 2022, doi: 10.1109/ACCESS.2022.3221109.
  13. 13.
    G. Yao, Q. Ren, X. Li, S. Zhao and R. Ruiz, "A Hybrid Fault-Tolerant Scheduling for Deadline-Constrained Tasks in Cloud Systems," in I’E’ Transactions on Services Computing, vol. 15, no. 3, pp. 1371-1384, 1 May-June 2022, doi: 10.1109/TSC.2020.2992928.
  14. 14.
    S. Meng, L. Luo, X. Qiu and Y. Dai, "Service-Oriented Reliability Modeling and Autonomous Optimization of Reliability for Public Cloud Computing Systems," in I’E’ Transactions on Reliability, vol. 71, no. 2, pp. 527-538, June 2022, doi: 10.1109/TR.2022.3154651.
  15. 15.
    A. Zhao, Z. Liu, J. Pan and M. Liang, "A Novel Addressing and Routing Architecture for Cloud-Service Datacenter Networks," in I’E’ Transactions on Services Computing, vol. 15, no. 1, pp. 414-428, 1 Jan.-Feb. 2022, doi: 10.1109/TSC.2019.2946164.
  16. 16.
    Z. Ahmad, A. I. Jehangiri, N. Mohamed, M. Othman and A. I. Umar, "Fault Tolerant and Data Oriented Scientific Workflows Management and Scheduling System in Cloud Computing," in I’E’ Access, vol. 10, pp. 77614-77632, 2022, doi: 10.1109/ACCESS.2022.3193151.
  17. 17.
    G. Yao, X. Li, Q. Ren and R. Ruiz, "Failure-Aware Elastic Cloud Workflow Scheduling," in I’E’ Transactions on Services Computing, vol. 16, no. 3, pp. 1846-1859, 1 May-June 2023, doi: 10.1109/TSC.2022.3188414.
  18. 18.
    J. Chen, Y. Wang, M. Ye and Q. Jiang, "A Secure Cloud-Edge Collaborative Fault-Tolerant Storage Scheme and Its Data Writing Optimization," in I’E’ Access, vol. 11, pp. 66506-66521, 2023, doi: 10.1109/ACCESS.2023.3291452.
  19. 19.
    M. Al-Makhlafi, H. Gu, A. Almuaalemi, E. Almekhlafi and M. M. Adam, "RibsNet: A Scalable, High-Performance, and Cost-Effective Two-Layer-Based Cloud Data Center Network Architecture," in I’E’ Transactions on Network and Service Management, vol. 20, no. 2, pp. 1676-1690, June 2023, doi: 10.1109/TNSM.2022.3218127.
  20. 20.
    A. Ahmed, S. Abdullah, S. Iftikhar, I. Ahmad, S. Ajmal and Q. Hussain, "A Novel Blockchain Based Secured and QoS Aware IoT Vehicular Network in Edge Cloud Computing," in I’E’ Access, vol. 10, pp. 77707-77722, 2022, doi: 10.1109/ACCESS.2022.3192111.
  21. 21.
    F. Cerveira, R. Barbosa, H. Madeira and F. Araujo, "The Effects of Soft Errors and Mitigation Strategies for Virtualization Servers," in I’E’ Transactions on Cloud Computing, vol. 10, no. 2, pp. 1065-1081, 1 April-June 2022, doi: 10.1109/TCC.2020.2973146.
  22. 22.
    X. Chen, "Scaling Byzantine Fault-Tolerant Consensus With Optimized Shading Scheme," in I’E’ Transactions on Industrial Informatics, vol. 20, no. 3, pp. 3401-3412, March 2024, doi: 10.1109/TII.2023.3303990.
  23. 23.
    C. Xu et al., "Privacy-Preserving and Fault-Tolerant Aggregation of Time-Series Data With a Semi-Trusted Authority," in I’E’ Internet of Things Journal, vol. 9, no. 14, pp. 12231-12240, 15 July15, 2022, doi: 10.1109/JIOT.2021.3135049.
  24. 24.
    T. Long et al., "A Deep Deterministic Policy Gradient-Based Method for Enforcing Service Fault-Tolerance in MEC," in Chinese Journal of Electronics, vol. 33, no. 4, pp. 899-909, July 2024, doi: 10.23919/cje.2023.00.105.
  25. 25.
    S. Ghanavati, J. Abawajy and D. Izadi, "Automata-Based Dynamic Fault Tolerant Task Scheduling Approach in Fog Computing," in I’E’ Transactions on Emerging Topics in Computing, vol. 10, no. 1, pp. 488-499, 1 Jan.-March 2022, doi: 10.1109/TETC.2020.3033672.
SCOPUS
SCImago Journal & Country Rank