ONE Data Lake & AWS S3 — Enhancing data Management and Analytics — Part 2

 

In February, we introduced the ONE Data Lake as part of our ONES 2.1 release, highlighting its integration capabilities with Splunk and AWS. In this blog post, we’ll delve into how the Data Lake integrates specifically with the S3 bucket of AWS.

A data lake functions as a centralized repository designed to store vast amounts of structured, semi-structured, and unstructured data on a large scale. These repositories are typically constructed using scalable, distributed, cloud-based storage systems such as Amazon S3, Azure Data Lake Storage, or Google Cloud Storage. A key advantage of a data lake is its ability to manage large volumes of data from various sources, providing a unified storage solution that facilitates data exploration, analytics, and informed decision-making.

Aviz ONE-Data Lake acts as a platform that enables the migration of on-premises network data to cloud storage. It includes metrics that capture operational data across the network’s control plane, data plane, system, platform, and traffic. As an enhanced version of the Aviz Open Networking Enterprise Suite (ONES), ONE-Data Lake stores the metrics previously used in ONES in the cloud.

Why AWS S3?

Amazon S3 (Simple Storage Service) is often used as a core component of a data lake architecture, where it stores structured, semi-structured, and unstructured data. This enables comprehensive data analytics and exploration across diverse data sources. S3 is widely used for several reasons:

1. Effective Integration and Ecosystem:

S3 integrates seamlessly with a wide range of AWS services and third-party tools, significantly enhancing data processing, analytics, and machine learning workflows. This seamless integration allows for efficient data ingestion, real-time analytics, advanced data processing, and robust machine learning model training and deployment, creating a powerful and cohesive ecosystem for comprehensive data management and analysis.

2. Durability and Reliability:

S3 is engineered for complete durability, ensuring that your data is exceptionally safe and consistently accessible. This level of durability is achieved through advanced data replication across multiple geographically dispersed locations, providing robust protection against data loss and guaranteeing high availability.

3. Security:

S3 offers comprehensive security and compliance capabilities, providing a robust framework for safeguarding data and ensuring regulatory adherence. This includes advanced data encryption, both at rest and in transit, ensuring that sensitive information remains protected throughout its lifecycle. Additionally, S3 provides granular access management tools, such as AWS Identity and Access Management (IAM), bucket policies, and access control lists (ACLs), allowing fine-tuned control over who can access and modify data. These features, combined with compliance certifications for various industry standards (such as GDPR, HIPAA, and SOC), make S3 a secure and reliable choice for data storage in highly regulated environments.

4. Scalability:

S3’s capability to handle virtually unlimited amounts of data makes it an unparalleled choice for building and maintaining expansive data lakes that require storing massive volumes of information. This scalability empowers organizations to seamlessly scale their storage needs without upfront investments in infrastructure, accommodating growing data demands effortlessly. This capability is crucial for enterprises seeking to centralize and manage diverse data types, enabling advanced analytics, machine learning, and other data-driven initiatives with agility and reliability.

5. Cost Effectiveness:

S3 provides flexible pricing models and a variety of storage classes to optimize costs based on data access patterns. Users can take advantage of storage classes like S3 Standard, S3 Intelligent-Tiering, S3 Standard-IA, S3 One Zone-IA, and S3 Glacier to manage expenses efficiently.

7. Disaster recovery and Backup:

S3 offers robust data management capabilities, including versioning, lifecycle policies, and replication, which streamline data governance and archival processes. These features ensure data integrity, compliance, and resilience across various use cases, from regulatory compliance to disaster recovery planning. This capability empowers businesses to unlock the full potential of their data assets, supporting diverse applications such as predictive analytics, business intelligence, and real-time reporting with ease and efficiency.

6. Data Management and Analytics:

S3’s robust features, including cross-region replication and lifecycle policies, establish it as an exceptional solution for disaster recovery strategies, ensuring data redundancy and resilience. Furthermore, S3’s lifecycle policies enable automated management of data throughout its lifecycle, facilitating seamless transitions between storage tiers and automated deletion of outdated or unnecessary data. Together, these features make S3 a reliable backup solution that enhances data durability and availability, providing organizations with peace of mind knowing their critical data is securely stored and accessible even in unforeseen circumstances.

Integrating S3 with ONES:

Steps involved to integrate the S3 cloud service with ONES,

1. Mapping S3 instance with the ONES server

To integrate the S3 service with ONES, follow these steps:

  • Configure S3 Instances: Set up the S3 instances on the ONES cloud page to start pushing metrics to the designated cloud endpoint.
  • Provide Necessary Details: The following information is required for the integration:
  • ARN Role: The unique identifier for the role that grants permissions to access specific AWS resources, including S3 buckets
  • Region: The AWS region where your S3 bucket is located
  • Bucket Name: The globally unique name of your S3 bucket
  • External ID(Optional): An external ID is an additional security measure used when granting cross-account access to IAM roles.

By accurately providing these details, you can effectively configure and integrate the S3 service with ONES, facilitating smooth metric collection and analysis.

2. Managing the created Instance through ONES:

The cloud instance created within ONES offers several management options to enhance user experience and sustainability. Users can update the integration settings, pause and resume metric uploads to the cloud, and delete the created integration when needed. These features make it easy for users to maintain and manage their cloud endpoint integrations effectively.

3. User defined metric update:

The end user has the flexibility to select which metrics from their network monitored by ONES should be uploaded to the designated cloud service. This ONES 2.1 release supports various metrics, including Traffic Statistics, ASIC Capacity, Device Health, and Inventory. Administrators can choose and deselect metrics from the available list within these categories according to their preferences.

4. Multi vendor support

The metric update is not limited to any particular hardware or network operating system (NOS). ONE-Data Lake’s data collection capability extends across various network operating systems, including Cisco NX-OS, Arista AOS, SONiC, and Non-SONiC. Data streaming occurs via the gnmi process on SONiC-supported devices and through SNMP on OS from other vendors.

S3 Analytical capabilities:

Analyzing data stored in an S3 bucket can be accomplished through various methods, each leveraging different AWS services and tools. Here are some key methods:

AWS Athena:

Description: A serverless interactive query service that allows you to run SQL queries directly against data stored in S3.

Use Case: Ad-hoc querying, data exploration, and reporting.

Example: Querying log files, CSVs, JSON, or Parquet files stored in S3 without setting up a database.

AWS Glue:

Description: A managed ETL (Extract, Transform, Load) service that helps prepare and transform data for analytics.

Use Case: Data preparation, cleaning, and transformation.

Example: Cleaning raw data stored in S3 and transforming it into a more structured format for analysis.

AWS SageMaker:

Description: A fully managed service for building, training, and deploying machine learning models.

Use Case: Machine learning and predictive analytics.

Example: Training machine learning models using large datasets stored in S3 and deploying them for inference.

Third-Party Tools:

Description: Numerous third-party tools integrate with S3 to provide additional analytical capabilities.

Use Case: Specialized data analysis, data science, and machine learning.

Example:Using tools like Databricks, Snowflake, or Domo to analyze and visualize data stored in S3.

Custom Applications:

Description: Developing custom applications or scripts that use AWS SDKs to interact with S3.

Use Case: Tailored data processing and analysis.

Example: Writing Python scripts using the Boto3 library to process data in S3 and generate reports.

Conclusion:

Aviz ONE-Data Lake serves as the cloud-native iteration of ONES, facilitating the storage of network data in cloud repositories. It operates agnostically across various cloud platforms and facilitates data streaming from major network device manufacturers like Dell, Mellanox, Arista, and Cisco. Network administrators retain flexibility to define which metrics are transferred to the cloud endpoint, ensuring customized control over the data storage process.

FAQ’s

1. What are the benefits of integrating ONE Data Lake with AWS S3 for network data storage?

Answer: Integrating Aviz ONE Data Lake with AWS S3 enables:

  • Centralized cloud storage for network telemetry
  • Unlimited scalability for growing datasets
  • Enhanced security with AWS encryption and IAM controls
  • Durable and highly available storage across regions
  • Flexible analytics through services like AWS Athena, Glue, and SageMaker
    This combination helps enterprises achieve cost-effective, compliant, and powerful data management.

2. How can I configure AWS S3 integration with Aviz ONE Data Lake?

Answer: To set up AWS S3 integration with ONE Data Lake:

  • Provide your ARN roleregionbucket name, and (optionally) external ID
  • Configure your S3 instance on the ONES cloud interface
  • Select desired network metrics (e.g., Traffic Stats, Device Health) for uploading
    This ensures seamless cloud metric collection customized to your organization’s needs.

3. What network telemetry metrics can be uploaded from ONE Data Lake to AWS S3?

Answer: With ONES 2.1, administrators can selectively upload metrics like:

  • Traffic Statistics
  • ASIC Capacity Metrics
  • Device Health and Platform Monitoring
  • Inventory Data
    The flexibility to customize and filter metrics helps optimize storage costs and streamline analytics pipelines.

4. Does ONE Data Lake support multi-vendor telemetry for S3 uploads?

Answer: Yes! Aviz ONE Data Lake collects and streams telemetry across multiple NOS platforms, including:

  • Cisco NX-OS
  • Arista EOS
  • SONiC
  • Cumulus Linux and other non-SONiC devices
    It uses gNMI for SONiC and SNMP for other vendors, ensuring multi-vendor support without limitations.

5. How can AWS services like Athena, Glue, and SageMaker enhance analytics on S3-stored network data?

Answer:

  • AWS Athena enables SQL-based querying directly on raw S3 data (no database setup needed).
  • AWS Glue automates ETL workflows, prepping raw network telemetry for structured analytics.
  • AWS SageMaker builds ML models using S3-stored datasets for predictive network optimization.
    Together, these services transform raw network data into actionable insights and machine learning opportunities.

Comments

Popular posts from this blog

The Status Quo of Not Innovating in Network Observability: 5 Reasons Why Incumbent Solutions Are Holding You Back

Validate SONiC with high Quality Bar for Your Mission Critical Use Cases

Accelerating SONiC for Private and Edge Clouds: Aviz and Cisco Partner for Coordinated Support