By Aman Tsegai
Seven months, two weeks and three days ago, we “officially” released Datazar version 1.0 to the general public. I put officially in quotes because Datazar has always been available to the general public. We develop with our community to ensure that what we’re building is essential to the workflow we always strive to perfect. Innovation and transparency are the two pillars of who we are, therefore we pride ourselves in our ability to innovate openly.
So before I get into any of the details, I would like to thank our community for helping us get here. It would have been impossible without your continuous feedback and criticism. The release of version 2 is a massive leap from version 1. Around 80% of the code base is entirely new. So let’s go ahead and unwrap the toys!
Something that’s entirely new is the Workspace. Before the workspace was introduced, the only thing you could do on Datazar was upload your data, map it with its related files/analysis (manually) and allow other people to download it. Pretty simple. But what’s the point of having a dataset if you can’t use it (visualize/analyze)? That extra step of downloading the dataset, analyzing it and re-uploading whatever the result is just doesn’t cut it (based on our data). So how about the ability to analyze the dataset right then and there using tools you’re already familiar with? Like, R and Python.
We developed a notebook and console interface for both R and Python; the two most popular analysis languages out there (open source).
Now you can analyze any dataset using R and Python with the notebook or console interfaces right in your browser. All the computation is done on Datazar’s servers so you can literally do it using a Chromebook.
We didn’t want to re-invent the wheel so we decided to stick to interfaces everyone is already comfortable with. Namely the console and the notebook. We just made slight modifications; ex: the images in the console interface appear in-line with the text results instead of another window (like it would in your terminal). Using these interfaces also allows for maximum reproducibility.
You can install any packages/modules you want, create any models, charts, visualizations etc… Just as you would in your local machine with your favorite editor.
In the case of R, you can also create RMarkdown files and not just consoles and notebooks.
Sometimes you don’t need to launch R or Python to explore a dataset. Sometimes a simple chart is enough to get you started. So for that exact reason we also introduced One-Click Charts. These are very simple, exploratory charts like scatter-plots, line-plots etc… If you update the dataset, the charts will also update accordingly so they can be used to keep track of things as a function of time.
Redesigned Project Interface
In version 1.0, the Project was an afterthought. The entire platform was centered around the File or dataset. We completely re-designed entire flows to make the platform centered around the Project. Why? Datasets by themselves don’t have the same pull as datasets with supporting documents such as analysis notebooks, notes and publication files.
This change completely changed how you work on the platform. The new interface that’s based on the Project pushes you to collaborate further with quick access to things like Project Discussions, Project Metrics, Project Activity etc…
With the new Project interface, you can now Publish your projects once you’re done analyzing your data. Whether you’re using a MarkDown file or a LaTeX file, you can re-direct your readers to your publication document so it’s the first thing they see.
Projects can now be replicated. Replication in Datazar, just like how you replicate your fellow researchers’ project, allows you to create an exact copy of the project. Once the copy is complete, you can re-run all the analysis and go through the datasets and methods without affecting/altering the original documents. We completely embedded the scientific process into the platform.
Along with the launch of version 2.0, we’re also officially releasing our Plans and Pricing.
The plans are differentiated based on how much data you want to compute, calls to the API and project privacy.
The first factor is computation. Anyone with any plan can upload as much data as they want and create analysis files as big as they want. The only difference is how much data you want to compute in the cloud using R, Python etc…. With the Pro and Team plans, you can compute any file size you want, just like in your own computer*.
The second factor in the pricing structure is access to the API. Anyone with a Datazar account can access the API using tokens. Getting on a Student, Pro or Team plan gives you higher API rate limits.
The third factor is project privacy. All free accounts can create as many public projects as they want while uploading any file type or size. The paid plans allow accounts to create private projects where only invited collaborators have access.
We also created a discounted version of the Pro plan for students with relatively high computational and API limits.
*considering hardware limitations
Faster Rendering: D3 visualizations and charts now load 3.5X faster than before due to the redesign of the rendering engine.
Bulk Upload: Files can now be uploaded in bulk instead of 1 by 1. Datazar will also now automatically detect what kind of file it is (raw data, prepared data, analysis, visualization…)
Community Contributions Control: Project owners and maintainers can now control whether the community can contribute datasets and analysis even if the projects are public.
Documentation: Several guides have been uploaded to docs.datazar.com/guides. In a few weeks, we’ll be opening up the Docs system to everyone so anyone can write guides.
External Sharing: Sharing to social media is now available to all files.
Thanks again to everyone who participated in making this happen, we’re one step closer to re-designing how research is done. If you’re thinking about moving your research to Datazar or just getting into research, don’t hesitate to contact firstname.lastname@example.org. The Explore and Product pages are great resources if you’re on the fence, but as always, the best way to find out is to get right to it.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…
Source:: R News