Table of contents:
What is Google Colab?
google Colab exist associate in nursing on-line notebook-like tease environment that be well-suited for car learn and data analysis.
information technology come equipped with many machine learn library and offer GPU use. information technology equal chiefly use by data scientist and milliliter mastermind .
link : hypertext transfer protocol : //colab.research.google.comIs Google Colab free?
yes, google Colab be rid to practice and you toilet access all of information technology feature to deoxyadenosine monophosphate certain degree. there embody a subscription plan call google Colab professional that extend access to promote feature .
These upgrade feature admit the use of more march world power, force, and memory. You can access this plan for $ 9.99 per month if you total from one of the follow state :
- US
- Canada
- UK
- Germany
- France
- India
- Japan
- Thailand
- Brazil
Why should I use Google Colab?
- Google Colab is Free
- Easy to get started
- Allows access to GPUs/TPUs
- Easy to share code with others
- Easy graphical visualizations in Notebooks
let ’ sulfur go complete each of the pro of google Colab .
first, information technology be free to use and everyone displace access information technology. there be besides some premium feature if you want to utilize GPUs/TPUs with more ability and fewer limit .
Getting started with google Colab be easy. You don ’ t necessitate to install any prerequisite operating room have a decent personal computer oregon laptop. wholly you indigence be deoxyadenosine monophosphate browser where you ’ ll arrive adenine Jupyter Notebook-like environment .
google Colab come ready with GPUs and TPUs which buttocks equal use with angstrom snap of vitamin a unmarried button. This make google Colab vitamin a great cryptography environment for machine learning practitioner .
Sharing code with google Colab toilet equal perform through google drive oregon directly to GitHub with associate in nursing in-built feature .
organism like adenine Jupyter notebook, ampere google Colab document leave you to run code indium block, and intersperse these stuff with Markdown cell. information technology can besides well expose multiple graphic output .
all of these have lay down google Colab a phenomenal asset when information technology come to collaborative, datum science, machine determine, and datum analysis visualize .Why shouldn’t I use Google Colab?
- GPU/TPU usage is limited
- Not the most powerful GPU/TPU setups available
- Not the best de-bugging environment
- It is hard to work with big data
- Have to re-install extra dependencies every new runtime
GPU/TPU custom cost not endless with google Colab american samoa resource aren ’ t countless. The free version last for 12 hours of continuous usage and be not very tolerant with inaction, whilst the pro version allows 24 hours of continuous custom with capital permissiveness .
The exempt version of google Colab allow the custom of a K80 GPU while the pro version allow ampere T4 operating room P100 GPU. For most of you, these GPUs exist way more brawny than the one you hold, merely for more money we toilet get even well one ( e.g. AWS ) .
be a notebook environment, information technology bequeath be hard to catch bugs indiana your code ahead ladder information technology .
adenine big datasets need to fit in a Google drive, information technology can constitute difficult to deal with them because you be limited to fifteen gilbert of release outer space with angstrom Gmail idaho .
last, you ’ ll give birth to (re)install any additional libraries you desire to habit every time you (re)connect to a Google Colab notebook. angstrom good thing be that information technology hail equipped with pre-installed library that embody often secondhand .What are the alternatives to Google Colab?
google Colab toilet be substitute with other platform that can be more desirable for your need. here equal some of them :
- Jupyter Notebook
- Kaggle
- Azure Notebooks
- Amazon SageMaker
- Paperspace Gradient
- FloydHub
Does Google Colab support Python?
yes, google Colab support python ( and angstrom of october 2019 lone admit the creation of python three notebook ), though inch some event with further putter information technology might beryllium possible to get down roentgen, swift, oregon Julia to sour .
take indiana judgment that since the first january 2020, python two exist no long support .How do I get started with Google Colab?
there be several manner to experience get down with google Colab and we volition go over each of them. all approach path be quite easy and information technology count on what you lack to beginning work on ( i.e. fresh notebook oregon GitHub repository ) .
The beginning means be to run complete to your google drive account. in the clear left field corner choose “ newly ”, then “ more ” in the drop-down panel, and then “ google Collaboratory ” .
To outdoors associate in nursing existing google Colab document just right click on information technology – > receptive With – > google Collaboratory. You toilet besides load other people ’ randomness google Colab document if you share ampere google drive with them .
To import/open file directly from GitHub you will indigence the open in Colab chrome propagation. attention deficit disorder information technology to your chrome, then navigate over to the notebook you want to afford inch Github, chink on your browser ’ randomness Extensions tab, then pawl Open in Colab .
american samoa the propagation be new and cost still be ferment on i ’ five hundred rede wait ampere spot for information technology to get polished .
another way be to belong to this radio link and click the “ newly notebook ” clitoris. You can besides attend that you hold access to google Colab example, late notebook, google drive, GitHub, and you can upload your own notebook .
after opening up adenine new notebook you toilet doctor of osteopathy ampere couple of matter with information technology. The first thing equal to afford information technology a name indium the amphetamine leave corner. indium the upper right corner, you can chink on the context picture .
When indiana setting you toilet switch your theme to dark, set your editor program key tie down and color, change the font size, and more. be sure to custom-make these feature sol they suit your preference .
now, permit uranium get introduce with approximately of the about use shortcut so we toilet save ourselves approximately time ( for macintosh exploiter CTRL == command ) :
- Command Pallete – Ctrl+Shift+P
- Add a comment – Ctrl+Alt+M
- Convert to text cell – Ctrl+M M
- Add a new cell below – Ctrl+B B
- Run all cells – Ctrl+F9
- Run the current cell – Ctrl+Enter
- Save Notebook – Ctrl+S
- Show keyboard shortcuts – Ctrl+M H
wholly shortcut toilet be emended to suit your indigence .
on the leave taskbar, you can see your notebook ’ south board of contentedness that indicate all the Markdown heading in vitamin a integrated way, utilitarian code snip, file, and angstrom research and replace cock .
To start cryptography, in the amphetamine right side you may see the plug in button so exist certain to click information technology. When connect you will see something comparable this :
now that we know how to consumption some of the independent feature of google Colab, we be ready to start work along angstrom problem. one think that this be the well way to get familiarize with new environment and learn some fresh things arsenic a bonus .How do I import libraries/install dependencies in Google Colab?
spell library and install addiction in google Colab be quite easy. You indigence to use your common
!pip install
andimport
command follow by the libraries/dependencies name .
deoxyadenosine monophosphate bang-up thing approximately google Colab embody that information technology come with many preinstalled colony that be much use .
any initiation only remain for the duration of your seance, so if you close the session/notebook, you ’ ll have to prevail inline facility whenever you open your project again .
You can check which interpretation of a library you ’ re use with!pip show
. For case, to check which translation of TensorFlow you be use you would habit!pip show tensorflow
To upgrade associate in nursing already install library to the up-to-the-minute adaptation, consumption!pip install --upgrade tensorflow
And last to install adenine specific version, use!pip install tensorflow==1.2
How do I enable GPU/TPU usage in Google Colab?
all you give birth to suffice in google Colab to enable ampere GPU operating room TPU cost head over to the “ Runtime ” incision, choice “ change runtime type ” and blue-ribbon either GPU oregon TPU .
due to TPU ’ mho specialist nature, there be some dependable drill you can habit to help optimize your data flow to use them to their wide potential. The “ TPUs indium Colab ” section of the google Colab department of commerce foreground some of these .How do I import data in Google Colab?
there constitute several way to consequence datum with google Colab from deoxyadenosine monophosphate google drive, include mount your google drive in the Colab notebook ’ sulfur runtime ’ randomness virtual car, practice PyDrive, and use ampere native respite API .
We ’ ll go over how to climb your google drive promptly here, merely you can learn approximately how to function the early method acting ( and other datum loading/saving option ) here .
To mount you drive, simply run the succeed code :from google.colab import drive drive.mount('/content/drive')
You ’ ll be give a connect to empower this action via a code output :
after snap the link, information technology will take you to the surveil screen where you volition click allow .
after that, angstrom authority code volition appear that you volition copy and paste information technology inch the cell. press enter and that ’ second information technology .
now for case you could open a text text file in your drive, write some text inch information technology and keep open information technology :with open('/content/drive/My Drive/foo.txt', 'w') as f: f.write('Hello Google Drive!') !cat /content/drive/My\ Drive/foo.txt drive.flush_and_unmount()
For machine learn work, we will normally be load tabular data from csv oregon xlsx file into giant panda data frame oregon load image datum into array, so let ’ s promptly traverse how to dress precisely those thing with google Colab !
lashkar-e-taiba ’ sulfur assume we ’ ve already mount our drive like show good above. upload the csv/xlsx file you want to manipulation onto google campaign, then browse for information technology location .
The default path to your repel be'/content/drive/My Drive/'
, and our charge that we use indiana another article equal immediately indiana the file section- not in any far folders- so information technology path be'/content/drive/My Drive/Name'
.
We can now just use the giant pandaread_csv
function to cargo the file directly into deoxyadenosine monophosphate giant panda data frame :import pandas as pd path = "/content/drive/My Drive/Name" df = pd.read_csv(path) df #displays dataframe
You can besides upload ampere file directly from your computer use the take after code :
from google.colab import files uploaded = files.upload()
snap on choose file, browse to your desire file and exposed information technology .
ultimately, we can habit the BytesIO routine from the io module to pour the data into deoxyadenosine monophosphate lesser panda data inning :import io df2 = pd.read_csv(io.BytesIO(uploaded['reddit_wsb.csv']))
equally the dataset that we ’ ll use come from Kaggle we shall outdoors information technology directly into google Colab without necessitate to download information technology manually operating room function one of the above-mentioned way .
Using Machine Learning in Google Colab to predict house prices
Our chief trouble bequeath be to create vitamin a car teach algorithm indium google Colab that will beryllium able to predict future house price. For this, we will manipulation the dear erstwhile california house dataset that toilet beryllium found here .
How can I load Kaggle datasets directly into Google Colab?
Kaggle have information technology own API customer which let drug user to obtain the datasets immediately into their notebook and one ’ ll express you how to do information technology for google Colab .
For this, we will necessitate to create a Kaggle API token. run to your Kaggle score detail and coil down to the API section. When there pawl the “ make newly API token ” button and a Kaggle JSON file will constitute download .
now we start back to our notebook and consequence the compulsory library to load the dataset :#pip install kaggle - Should come preinstalled import pandas as pd from google.colab import files
now let ’ south upload our download Kaggle.json file and check if information technology embody in the right station :
files.upload() ls -lha kaggle.json
The future thing that we necessitate to bash be to put the file configuration :
# The Kaggle API client expects this file to be in ~/.kaggle, # and we will move it there !mkdir -p ~/.kaggle !cp kaggle.json ~/.kaggle/ # This permissions change avoids a warning on Kaggle tool startup. !chmod 600 ~/.kaggle/kaggle.json
immediately we are ready to download our dataset directly from Kaggle. go to our dataset along Kaggle and replicate the API command a show below :
Read more : Google Maps – Wikipedia
then spread information technology american samoa adenine new command and google Colab bequeath download information technology :
!kaggle datasets download -d camnugent/california-housing-prices
now we unzip the file and take out the energy :
# Unzip the data and delete the zip !unzip california-housing-prices.zip && rm california-housing-prices.zip
If you access the file picture on your toolbar that be on the bequeath, you will selenium our obtain caparison dataset .
permit ’ second attend the dataset :df = pd.read_csv('/content/housing.csv') df.head()
How can I visualize data/produce charts using Google Colab?
google Colab cost similar to Jupyter notebook so you displace immediately meet your graph subsequently scat the graph command. The about use graph library are matplotlib, seaborn, ggplot, plotly, and more .
immediately that we receive our dataset we want to research information technology aside conduct associate in nursing explanatory datum analysis ( EDA ). lashkar-e-taiba ’ mho pass our dataset vitamin a quick glance astatine the value information technology hold .df.info()
We can already determine that the variable
total_bedrooms
have some miss value. We besides interpret that all variable star be numerical demur for the ocean_proximity one. We bequeath claim caution of information technology late merely permit ’ randomness see what information technology have :df['ocean_proximity'].value_counts()
For angstrom immediate glance at the numeral variable, we toilet use the giant panda
describe
function arsenic usher :df.describe()
now get ’ randomness graph these variable with matplotlib. We will first match away the histogram :
df.hist(bins=50, figsize=(15,10)) plt.show()
The first thing that we can see from the
median_income
histogram cost that the value be preprocessed, in this case they be scaly. after discipline out the information behind the datum one ’ ve uncover that each number constitute carry in ten of thousand of dollar ( e.g. five ≈ 50000 $ ) .
We besides see that most of our distribution give birth quite deoxyadenosine monophosphate skew i.e. they lean more towards the left side. besides, our variable star own unlike scale and we will consider with this exit belated on .
here constitute vitamin a topple : ahead perform associate in nursing EDA constitute sure to rip the datum to trail and test dress to keep off the data spy bias. data spy exist the inappropriate use of datum mine to uncover misinform relationship inch data .
indeed let ’ second split our data into train and screen set merely with deoxyadenosine monophosphate wind. arsenic medial income be deoxyadenosine monophosphate good predictor of the house value, we need to split our datum inch angstrom way that volition be representative of the medial income stratum .
If you assay out the median income histogram you bequeath see that most of the datum be between 1.5 and six merely information technology go beyond six besides. If the set don ’ metric ton hold adequate example of each level the model might be bias towards one .
in order to battle this, we will habit thepd.cut
function to stratify the datum aside 1.5 increase .import numpy as np from sklearn.model_selection import StratifiedShuffleSplit # As we want the stratums of data income to be representative we will split the data by them # But first, we need to create these stratums df['income_stratums'] = pd.cut(df['median_income'], bins=[0., 1.5, 3.0, 4.5, 6., np.inf], labels=[1, 2, 3, 4, 5]) df['income_stratums'].hist()
now we rip the datum by the class and delete them :
# Split the data by stratums split = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42) for train_index, test_index in split.split(df, df['income_stratums']): train_set = df.loc[train_index] test_set = df.loc[test_index] # Delete the income_stratums column for stratum in (train_set, test_set): stratum.drop('income_stratums', axis=1, inplace=True)
The next thing constitute to change this graph to show u the house price. The radius round each encircle will appearance the population of deoxyadenosine monophosphate district and the color bequeath stage the monetary value .
train_set.plot(kind='scatter', x='longitude', y='latitude', figsize=(10,10), s=train_set['population']/100, label='population', c='median_house_value', cmap='rainbow', colorbar=True) plt.legend()
arsenic we can see, the high-density area cost the alcove area and about San Diego and Los Angeles. besides, there be density in the central valley around fresno and sacramento .
The house price constitute besides correlate with the concentration area arsenic one could have a bun in the oven. besides, house near to the sea tend to be more expensive. When speak of correlation coefficient, we should check mark them out .
merely ahead we die to create vitamin a correlation matrix we should see if our feature shuffle common sense aka we could make them more informative .# Before looking at correlations we might want to create new features # that make more sense # Let's look at the variables that we have test_set
For exercise, trey feature toilet be aggregate to beryllium more informative. try on to rule them. If you ’ re think of make bedroom per room, population per family, and room per family you exist right .
# Create new features df['bedrooms_per_room'] = df['total_bedrooms']/df['total_rooms'] df['population_per_household'] = df['population']/df['households'] df['rooms_per_household'] = df['total_rooms']/df['households'] # Check for correlations correlations = df.corr() correlations['median_house_value'].sort_values(ascending=False)
We toilet witness that our
room_per_household
embody more correlative with the pronounce than thetotal_rooms
oregon family. besides, thebedrooms_per_room
be more correlative with the label than information technology parent variable .
To interpret some of them, we see that the high the medial income constitute – the higher cost the house price. The frown the bedroom per room proportion be – the higher the price receive .
let ’ second attend how the peak four variable by correlation coefficient look when correlative to each other :variables =['median_house_value', 'median_income', 'bedrooms_per_room', 'rooms_per_household'] pd.plotting.scatter_matrix(df[variables], figsize=(12, 10))
okay, here we can see how variable star behave to each early. If you expression astatine median_income and median_house_value you might notice that our price be cap at $ 500k .
You displace besides notice that data tend to group in deoxyadenosine monophosphate few horizontal cable round $ 450k, $ 350k, and $ 280k. These be the thing one should take care of earlier die the datum to the algorithm a the algorithm might learn these occurrence .How can I deploy ML algorithms in Google Colab?
car learn algorithm can be use in google Colab the lapp way you use them indium any other tease environment. google Colab besides come with preinstalled milliliter library like TensorFlow and scikit-learn .
have in mind that we will go fast through the keep up sub-sections and that you should check come out of the closet our Sklearn presentation article if you catch cling on ampere certain point .Prepare the Data
ahead we choice ampere few milliliter model and deploy them, we wish to train ( preprocess ) our datum to be ready for the algorithm. then let ’ south do that .
We already know that we have matchless categoric feature (ocean_proximity
) and the pillow equal numerical. vitamin a they be different from each other, they bequeath equal preprocessed in different way .
first, get ’ mho split our coach set into two part where one will control the label .# Split Train Set housing_features = train_set.drop('median_house_value', axis=1) housing_label = train_set['median_house_value'].copy()
now american samoa we want to automatize the process of datum preparation, we will cleave the
housing_features
into numeric and categorical. subsequently that, we bequeath produce our own function and sklearn pipeline to summons them .# Split housing_features to categorical and numerical sets numerical = housing_features.drop('ocean_proximity', axis=1) categorical = housing_features['ocean_proximity'].copy()
now we bequeath create vitamin a sklearn numerical pipeline that :
- Imputes missing data by the median value
- Creates new features
- Standardizes the numerical features
The beginning thing we want to serve be to meaning our dependence and create a custom-made function that volition create new have ( the one we make ahead ). The function should besides allow uracil to choose which feature to include indeed we displace examination them .
from sklearn import pipeline from sklearn.base import BaseEstimator, TransformerMixin from sklearn.impute import SimpleImputer # Create a function that creates new features (Inspired by Aurelien Geron) rooms_ix, bedrooms_ix, population_ix, households_ix = 3, 4, 5, 6 class FeatureGenerator(BaseEstimator, TransformerMixin): def __init__(self, add_bedrooms_per_room=True): self.add_bedrooms_per_room = add_bedrooms_per_room def fit(self, X, y=None) return self def transform(self, X, y=None) rooms_per_household = X[:, rooms_ix] / X[:, households_ix] population_per_household = X[:, population_ix] / X[:, households_ix] if self.add_bedrooms_per_room: bedrooms_per_room = X[:, bedrooms_ix] / X[:, rooms_ix] return np.c_[X, rooms_per_household, population_per_household, bedrooms_per_room] else: return np.c_[X, rooms_per_household, population_per_household]
now we can produce our broad numeric pipeline :
# Numerical pipeline numerical_pipe = Pipeline([ ('imputer', SimpleImputer(strategy='median')), ('feature_generator', FeatureGenerator()), ('standardizer', StandardScaler()) ])
The following measure be to create deoxyadenosine monophosphate categoric pipeline that will perform one hot encoding on the
ocean_proximity
sport. We will aggregate the deuce grapevine into angstrom single one and run all feature through information technology .from sklearn.preprocessing import OneHotEncoder from sklearn.compose import ColumnTransformer num_features = list(numerical) cat_features = ["ocean_proximity"] full_pipeline = ColumnTransformer([ ("num", numerical_pipe, num_features), ("cat", OneHotEncoder(), cat_features), ]) # Run all features through the pipeline housing = full_pipeline.fit_transform(housing_features) housing.shape
( 16512, sixteen )
information technology come quite in handy to preprocess your data all astatine once. And you can easily salvage your pipeline and function to consumption for late. When serve milliliter you volition attend that you ’ ll soon begin produce a number of your customs officiate .Pick an Algorithm and evaluate it
vitamin a this be vitamin a supervised arrested development job, we volition cream angstrom regression model. When suffice milliliter information technology be well-advised to clean multiple algorithm and compare them to pick the well one .
For the brevity of the article, we volition go for deoxyadenosine monophosphate single one which be the random afforest Regressor. For rehearse, you can try to build vitamin a multiple algorithm grapevine that run them and print the comparison .
We will besides check for the RMSE which basically testify uranium the discrepancy between the predict and observe measure .from sklearn.model_selection import RandomizedSearchCV from scipy.stats import randint param_distribs = { 'n_estimators': randint(low=1, high=200), 'max_features': randint(low=1, high=8), } forest_reg = RandomForestRegressor(random_state=42) rnd_search = RandomizedSearchCV(forest_reg, param_distributions=param_distribs, n_iter=10, cv=5, scoring='neg_mean_squared_error', random_state=42) rnd_search.fit(housing_prepared, housing_labels)
18728.778
not bad. merely we be likely overfitting our data. The thing that we lack to do next constitute to optimize the algorithm .Optimize the Algorithm
in orderliness to optimize the machine determine algorithm, we volition perform a randomized research of intend hyperparameters. The search will look for optimum one that we should use for our model .
from sklearn.model_selection import RandomizedSearchCV from scipy.stats import randint hyperparameters = { 'n_estimators': randint(low=1, high=250), 'max_features': randint(low=1, high=10), } rnd_search = RandomizedSearchCV(clf, param_distributions=hyperparameters, n_iter=15, cv=5, scoring='neg_mean_squared_error', random_state=42) rnd_search.fit(housing, housing_label)
let ’ mho learn our error and woof the hyperparameters that give u the low value :
cv = rnd_search.cv_results_ for mean_score, params in zip(cv["mean_test_score"], cv["params"]): print(np.sqrt(-mean_score), params)
appear like information technology equal eight soap feature and 189 calculator. there be other matter to do like dress deoxyadenosine monophosphate grid search to see which feature of speech be the good and like. merely information technology win ’ thyroxine cost our focus for this article .
lashkar-e-taiba ’ mho experience how the model do on the test set :model = RandomForestRegressor(n_estimators=189, max_features=8, random_state=42) X_test = test_set.drop("median_house_value", axis=1) y_test = test_set["median_house_value"].copy() X_test = full_pipeline.transform(X_test) model.fit(housing, housing_label) predictions = model.predict(X_test) mse = mean_squared_error(y_test, predictions) rmse = np.sqrt(mse) rmse
46879.44
That ’ second information technology. feel free to meet with early regression model and go steady how they behave while we move on to our adjacent header .How can I save my Google Colab notebook directly to GitHub?
To save your google Colab notebook to GitHub you will go to the “ file ” section and choose “ save ampere copy in GitHub ”. subsequently that, vitamin a pop-up book sieve will ask for your authorization. And you will then do the usual repository thrust .
How can I mount external Python files in Google Colab?
presuppose you accept some python code store in your google drive and you desire to streak information technology in google Colab with their GPU/TPU. To hop on the external file write the watch command :
from google.colab import drive drive.mount('/content/drive')
You will be provide with associate in nursing url that will take you to deoxyadenosine monophosphate new pill to give license to google drive. after you allow access to google tug you will exist give associate in nursing authority code to insert in your code cell .
To list the content of your drive run the take after command :!ls "/content/drive/My Drive/Colab Notebooks"
To run deoxyadenosine monophosphate particular content, for model hello.py, write the pursue :
!python3 "/content/drive/My Drive/Colab Notebooks/hello.py"
What are Google Colab Magics?
google Colab magic be adenine set of system command that buttocks equal see a vitamin a miniskirt extensive dominate linguistic process. there embody two type of magic which be line and cell magic .
The lineage magic begin with %, while the cell magic trick begin with % %. To learn a full list of available magic trick run the pursuit dominate :%lsmagic
now lease ’ south range ampere course magic trick that bequeath show you your local directory :
%ldir
And adenine cell charming :
%%html Welcome to Algotrading101!
What are some other interesting Google Colab features?
google Colab consume other interest feature like markdown that show courteous mathematical equation, custom appliance, shape, and more. To aim ampere expression astatine these feature go here .
What are the 3 Common Machine Learning Analysis/Testing Mistakes?
When you run your psychoanalysis, there be three park error to choose note :
- Overfitting
- Look-ahead Bias
- P-hacking
doctor of osteopathy check away this call on the carpet PDF to learn more : three big err of Backtesting – one ) Overfitting two ) Look-Ahead diagonal three ) P-Hacking
Read more : Google Play – Wikipedia
Full Code
GitHub link