Get started with RStudio on Amazon SageMaker | AWS Machine Learning Blog

Today, we’re excited to announce RStudio on Amazon SageMaker, the industry’s first fully-managed RStudio integrated development environment (IDE) in the cloud. You can now bring the current RStudio licenses and migrate your self-managed RStudio environments to Amazon SageMaker in a few simple steps.

RStudio is one of the most popular IDEs among R developers for data science, statistical analysis, and machine learning (ML). However, building, securing, scaling, and maintaining RStudio yourself can be tedious and cumbersome. RStudio on SageMaker offers a quick and easy way to migrate your self-managed RStudio environments to the cloud. You can bring your current RStudio licenses to SageMaker through AWS License Manager. You can fully secure your RStudio environment using several built-in SageMaker capabilities, such as applying fine-grained access controls via AWS Identity and Access Management (IAM), restricting network traffic to your VPC, and automatically encrypting data at rest. You can also monitor logs through Amazon CloudWatch and manage billings through AWS Billing. Meanwhile, you can author R code in the familiar RStudio interface with fully elastic compute that can be dialed up and down without interrupting your work, thereby accelerating model development and improving productivity.

SageMaker already comes with Amazon SageMaker Studio notebooks for building and deploying ML models. With RStudio on SageMaker, you can unify your Python and R data science teams in one single place, enabling closer collaboration and centralized administration of your data science organization. Moreover, developers proficient with both R and Python can freely switch between RStudio and Studio notebooks. All of your work, including code, datasets, repositories, and other artifacts, are synchronized between the two environments through the common underlying Amazon Elastic File System (Amazon EFS) storage.

In this post, we show you how to bring your RStudio Workbench licenses into License Manager, how to set up RStudio on SageMaker, and how to administer RStudio on SageMaker.

Prerequisites

RStudio on SageMaker requires an IAM execution role that has permissions to License Manager and CloudWatch. The actions needed are listed in the following code. For more information, see Create DomainExecution Role.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "license-manager:ExtendLicenseConsumption",
                "license-manager:ListReceivedLicenses",
                "license-manager:GetLicense",
                "license-manager:CheckoutLicense",
                "license-manager:CheckInLicense",
                "logs:CreateLogDelivery",
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:DeleteLogDelivery",
                "logs:Describe*",
                "logs:GetLogDelivery",
                "logs:GetLogEvents",
                "logs:ListLogDeliveries",
                "logs:PutLogEvents",
                "logs:PutResourcePolicy",
                "logs:UpdateLogDelivery", 
                "sagemaker:CreateApp"
            ],
            "Resource": "*"
        }
    ]
} 

We prepared an AWS CloudFormation stack that creates the required IAM execution role in your account.

  1. Choose Launch Stack.

The link takes you to the us-east-1 Region, but you can change to your preferred Region. IAM roles are global resources, so you can access the role in any Region.

  1. In the Specify template section, choose Next.
  2. In the Specify stack details section, for Stack name, enter a name and choose Next.
  3. In the Configure stack options section, choose Next.
  4. In the Review section, select I acknowledge that AWS CloudFormation might create IAM resources and choose Next.
  5. When the stack status changes to CREATE_COMPLETE, go to the Resources tab to find the IAM role created.

RStudio on SageMaker can only be configured by creating a new SageMaker domain. You need to use an AWS account and Region that doesn’t have an existing SageMaker domain.

Bring your RStudio license to License Manager

RStudio Workbench is a paid product and requires that each user is appropriately licensed. SageMaker is not a reseller of RStudio Workbench licenses. RStudio on SageMaker requires a product license from RStudio. For customers of RStudio Workbench Enterprise, licenses are issued at no additional cost. To use RStudio on SageMaker, you need to bring your RStudio license to License Manager by completing the following steps:

  1. If you don’t have an RStudio Workbench license, you can purchase one on RStudio Pricing or by contacting RStudio Sales ([email protected]).
  2. To add RStudio on SageMaker to your existing RStudio Workbench Enterprise purchase, or to convert an RStudio Workbench Standard license to SageMaker, contact your RStudio Sales Representative (or [email protected]), who will send you the appropriate electronic order form.

RStudio grants your RStudio Workbench licenses to your AWS accounts through License Manager in the US East (N. Virginia) Region. You can expect the license grant process to complete within 3 business days after you share the AWS account IDs with RStudio.

  1. After the license is granted, you receive notification from RStudio with instructions to log in to the License Manager console Granted licenses page in the US East (N. Virginia) Region to accept the license grant.
  2. If this is your first time using License Manager, choose Create customer managed license.
  3. Select I grant AWS License Manager the required permissions, and choose Grant permissions.
  4. On the Granted licenses page, select the license grant with RSW-SageMaker as the product name and choose View.
  5. On the license detail page, choose Accept & activate license.

Now you have accepted your RStudio Workbench license into AWS. We’re ready to create RStudio on SageMaker. Although the license grant occurs in the US East (N. Virginia) Region, your license can be consumed by RStudio on SageMaker in any Region that supports the feature.

Create a SageMaker domain with RStudio

You can configure RStudio on SageMaker as part of a multi-step SageMaker domain creation process on the AWS Management Console. You can also perform the steps using the AWS Command Line Interface (AWS CLI) following instructions on Create an Amazon SageMaker Domain with RStudio using the AWS CLI page. To create your domain on the console, complete the following steps:

  1. On the SageMaker console, on the SageMaker Domain page, choose Standard setup, and choose Configure.
  2. For Authentication method, choose either AWS SSO or AWS IAM.

In this post, we choose AWS IAM. To use AWS Single Sign-On (AWS SSO), follow the steps in Set Up AWS SSO for Use with Amazon SageMaker Studio.

  1. Under Permission, choose or create a default IAM execution role for your SageMaker domain.

RStudio requires a separate IAM execution role to access License Manager and publish logs to CloudWatch.

  1. Select your preferred Network and Storage options.

The network and encryption settings are effective to both RStudio and Studio.

  1. Choose Next.
  2. In Step 2: Studio settings, you can configure notebook sharing, SageMaker projects, and Amazon SageMaker JumpStart for your Studio notebooks.
  3. In Step 3: RStudio settings, you can configure the RStudio Workbench license, server instances, permission, RStudio Connect, and RStudio Package Manager.

Studio automatically detects a RStudio Workbench license after it’s added and accepted in License Manager. For example, in the following screenshot, three RStudio Workbench seats are detected and added.

You can choose the instance type for the RStudio server that is going to be shared by all users. ml.t3.medium is free to use. For more information about how to choose an instance type, see RStudioServerPro instance type page. Note that this is not the instance where your data scientists run ML. Later, when we create a user profile to use RStudio on SageMaker, a data scientist can create R sessions with more instance types to choose from.

  1. For Permission, select the role

    <cloudformation-stack-name>

    -RStudio-ExecutionRole-xxxxxx we created previously.

This role allows RStudio on SageMaker to access RStudio licenses in License Manager and publish logs to CloudWatch.

  1. Optionally, you can configure default RStudio Connect and RStudio Package Manager URLs for all user profiles if you have the two servers running.
    1. For RStudio Connect, enter your RStudio Connect server URL.
    2. For RStudio Package Manager, enter a CRAN or Bioconductor repository.

To host RStudio Connect and RStudio Package Manager in AWS, you can learn more in the post Host RStudio Connect and Package Manager for ML development in RStudio on Amazon SageMaker.

  1. Choose Submit to create the domain.

The domain creation takes a couple of minutes. When it’s complete, we can add users for data scientists to access RStudio on SageMaker.

Create a SageMaker domain user profile

Creating a user in a SageMaker domain allows access to both Studio and RStudio on SageMaker. You can configure both on the SageMaker console. If you prefer to use the AWS CLI to set up a user, see Manage users page. To enable RStudio for a user via the console, complete the following steps:

  1. On the SageMaker Domain page, choose Add user.
  2. For Name, enter a user name.
  3. For Default execution role, choose an existing role or create a new one with specific access to Amazon Simple Storage Service (Amazon S3) buckets.
  4. Choose Next.
  5. In Step 2: Studio settings, you can configure the access to SageMaker project templates and JumpStart. You can keep it default even though we don’t use this feature in this post; you can always edit it later.
  6. Choose Next to proceed.

In Step 3: RStudio settings, for License Authorization, Studio automatically detects and adds RStudio Workbench licenses to the domain for you to choose from:

    • RStudio Admin – Has access to the RStudio IDE and RStudio administrative dashboard
    • RStudio User – Has access to the RStudio IDE
    • Unauthorized – Doesn’t have access to the RStudio IDE

Note that all options grant access to Studio.

  1. Choose either RStudio Admin or RStudio User and choose Submit to proceed.

The user profile creation takes less than a minute.

  1. You can open RStudio on SageMaker from the Launch app drop-down menu in the user list and choose RStudio.

Administer RStudio on Amazon SageMaker

As an administrator, you can administer and monitor the usage of your data scientists using RStudio on SageMaker via the SageMaker console, the RStudio administrative dashboard, and in CloudWatch.

SageMaker console

An administrator who has necessary IAM permission such as AdministratorAccess can use the SageMaker console to manage the domain for both Studio and RStudio on SageMaker, such as the networking, security, and encryption. An administrator can also create users and assign proper authorization levels. Furthermore, an administrator can delete the apps created by a user, edit a user’s IAM execution role, and delete a user via the SageMaker console. An app in a SageMaker domain refers to an application that supports the reading and execution experience of the user’s notebooks, terminals, and consoles. Specifically, for RStudio on SageMaker, each R session is an RSessionGateway app. To edit a user, complete the following steps:

  1. On the SageMaker Domain page, choose a user.

You can see a list of apps created by the user.

  1. Delete an app by choosing Delete app.

When all apps are deleted and in the deleted state, the Edit button is available.

  1. Choose Edit to edit permissions and other configurations, or delete the user profile.

RStudio administrative dashboard

An administrator with RStudio admin authorization level can access the RStudio administrative dashboard to monitor active sessions and compute resources consumed by users. You can access the dashboard after you assume the SageMaker user profile that has RStudio admin authorization and replace workspaces with admin, as described in RStudio administrative dashboard page.

The following screenshot shows an example of the administrative dashboard.

CloudWatch

RStudio on SageMaker publishes logs, as allowed by the RStudio execution role, from the head node and R sessions to CloudWatch. Administrators and developers who have sufficient IAM permission to view CloudWatch Logs can view the system logs, and filter and debug issues. For example, in the following screenshot, you can find logs for an RSession (1), the JupyterLab server (2) and the RStudio server (3) as three distinct log streams on the CloudWatch console.

Conclusion

RStudio is one of the most popular IDEs for data scientists and ML developers who code in R language. Starting today, you can use RStudio on Amazon SageMaker and take advantage of the fully managed infrastructure, access control, networking, and security capabilities in SageMaker. In this post, we detailed the steps to onboard RStudio Workbench paid licenses to License Manager, create a SageMaker domain with RStudio, create users to use RStudio on SageMaker, and administer RStudio on SageMaker via the SageMaker console, RStudio administrative dashboard, and CloudWatch.

You can read more about the data scientist experience in the post Announcing Fully Managed RStudio on Amazon SageMaker for Data Scientists. You can also learn how to Host RStudio Connect and Package Manager for ML development in RStudio on Amazon SageMaker to enhance your team’s experience working with RStudio on SageMaker.

About the Authors

Michael Hsieh is a Senior AI/ML Specialist Solutions Architect. He works with customers to advance their ML journey with a combination of AWS Machine Learning offerings and his ML domain knowledge. As a Seattle transplant, he loves exploring the great mother nature the region has to offer, such as the hiking trails, scenery kayaking in the SLU, and the sunset at the Shilshole Bay.

Sam Liu is a Product Manager at Amazon Web Services (AWS) focusing on AI/ML infrastructure and tooling. Beyond that, he has 10 years of experience building machine learning applications in various industries. In his spare time, he enjoys golf and international traveling.