Building the future of collaborative AI development with Akshay Agrawal

Gradient Dissent

Tuesday, January 7, 202541m

Spotify Apple

Gradient Dissent

0:0041:03

What You'll Learn

✓Marimo is a modern replacement for Jupyter notebooks, built to be reproducible, Git-friendly, and enable deployment of notebooks as web apps.
✓Jupyter notebooks often suffer from reproducibility issues, where re-running a notebook doesn't always yield the same results due to the REPL-like execution model.
✓Marimo uses a reactive, DAG-based execution model to guarantee that the code on the page matches the outputs, and small code changes yield small diffs in Git.
✓Streamlit is great for building interactive applications, but doesn't provide the same exploratory data workflow as notebooks. Marimo aims to bridge that gap.
✓Marimo has seen strong initial adoption, with growth coming both organically through sharing of notebooks and some targeted efforts like an integration with Hugging Face.
✓Akshay's background at Netflix, where notebooks are widely used in production, informed some of the thinking behind Marimo's design.

AI Summary

The episode discusses Marimo, a new open-source Python notebook created by Akshay Agrawal and his co-founder Miles. Marimo is designed to address key issues with Jupyter notebooks, such as lack of reproducibility and poor Git-friendliness. Marimo provides a reactive, DAG-based execution model to ensure code and outputs stay in sync, and it stores notebooks as pure Python for better version control. The host and Akshay also compare Marimo to other notebook-like tools like Streamlit, discussing how Marimo bridges the gap between exploratory data work and deployable applications.

Key Points

1Marimo is a modern replacement for Jupyter notebooks, built to be reproducible, Git-friendly, and enable deployment of notebooks as web apps.
2Jupyter notebooks often suffer from reproducibility issues, where re-running a notebook doesn't always yield the same results due to the REPL-like execution model.
3Marimo uses a reactive, DAG-based execution model to guarantee that the code on the page matches the outputs, and small code changes yield small diffs in Git.
4Streamlit is great for building interactive applications, but doesn't provide the same exploratory data workflow as notebooks. Marimo aims to bridge that gap.
5Marimo has seen strong initial adoption, with growth coming both organically through sharing of notebooks and some targeted efforts like an integration with Hugging Face.
6Akshay's background at Netflix, where notebooks are widely used in production, informed some of the thinking behind Marimo's design.

Topics Discussed

#Reproducible notebooks#Git-friendly notebook formats#Reactive notebook execution models#Bridging exploratory data work and application development#Notebook-based development workflows

Frequently Asked Questions

What is "Building the future of collaborative AI development with Akshay Agrawal" about?

What topics are discussed in this episode?

This episode covers the following topics: Reproducible notebooks, Git-friendly notebook formats, Reactive notebook execution models, Bridging exploratory data work and application development, Notebook-based development workflows.

What is key insight #1 from this episode?

Marimo is a modern replacement for Jupyter notebooks, built to be reproducible, Git-friendly, and enable deployment of notebooks as web apps.

What is key insight #2 from this episode?

Jupyter notebooks often suffer from reproducibility issues, where re-running a notebook doesn't always yield the same results due to the REPL-like execution model.

What is key insight #3 from this episode?

Marimo uses a reactive, DAG-based execution model to guarantee that the code on the page matches the outputs, and small code changes yield small diffs in Git.

What is key insight #4 from this episode?

Streamlit is great for building interactive applications, but doesn't provide the same exploratory data workflow as notebooks. Marimo aims to bridge that gap.

Who should listen to this episode?

This episode is recommended for anyone interested in Reproducible notebooks, Git-friendly notebook formats, Reactive notebook execution models, and those who want to stay updated on the latest developments in AI and technology.

Episode Description

In this episode of Gradient Dissent, Akshay Agrawal, Co-Founder of Marimo, joins host Lukas Biewald to discuss the future of collaborative AI development. They dive into how Marimo is enabling developers and researchers to collaborate seamlessly on AI projects, the challenges of scaling AI tools, and the importance of fostering open ecosystems for innovation. Akshay shares insights into building a platform that empowers teams to iterate faster and solve complex AI challenges together. Follow Weights & Biases: https://twitter.com/weights_biases https://www.linkedin.com/company/wandb Join the Weights & Biases Discord Server: https://discord.gg/CkZKRNnaf3

Full Transcript

You're listening to Gradient Dissent, a show about making machine learning work in the real world. And I'm your host, Lucas B. Wald. Akshay Agarwal is the CEO and co-founder of Marimo, a new kind of notebook that's been exploding in popularity. I myself am a big fan of the product. He's clearly passionate about building developer tools for AI engineers just like I am. And it's super fun to nerd out with him. I hope you enjoy this episode. All right, so actually, yeah, I'm a fan of your product and user of your product, Marimo, and I'm sure it's going to get like super in the weeds for like Marimo users only like shortly. But I kind of wanted to keep like at least the first few minutes of this recording accessible to more people. So I feel like maybe we should start with, you know, what is a notebook and why is a notebook important in the AI field? Yeah, 100%. So I view a notebook, and in particular, most notebooks these days are in Python. I view a notebook as a programming environment that allows for interactive computing. So it lets you have blocks of Python code and to visualize the output of each block of Python code. So you can have plots, you can visualize your tensors, you can visualize training runs, and you can interleave your code and your visuals with Markdown that documents the exploration, experiment, or whatever you've done. And for data work in particular, whether you're training a machine learning model or whether you're hidden databases, it's really valuable to have a programming environment that lets you see your data while you work on it. And for this reason, notebooks are widely used across the sciences, but they're also widely used to train machine learning models. and they're the central interface for compute in a bunch of products like Google Colab, Databricks, and AWS SageMaker. And I think it kind of speaks to the fact, and this is one of the theses behind Weights and Biases, that the workflow of someone building an AI model is sort of different than a developer workflow. It's more exploratory. You write a lot of code that might not see the light of day or need to be production ready. a lot of the code you write is to kind of learn, okay, what's going on with my data, what's going on with my model, and things like that, right? Yeah, 100%. And, you know, Jupyter Notebooks are kind of the place where, you know, I started, Collabs are a little bit different, Streamlates a little bit different, Marimo is a little bit different. Maybe you could talk about how you thought about Marimo and how it kind of fits into this world of notebook variants. Yeah, so I guess maybe to set this up, stage, I can briefly say what Marimo is and how it's different from Jupyter. So Marimo is an open source Python notebook that's built from the ground up to solve what I viewed as key problems with Jupyter notebooks in particular. And so unlike Jupyter, Marimo notebooks are reproducible. They're Git-friendly. You can deploy them as interactive web apps, and she can execute them as Python scripts. So with Marimo, what me and Miles, my co-founder, are really trying to do is blend the best parts of interactive computing with sort of the rigor and discipline of traditional software engineering as well as the developer experience associated with engineering. And you can think of Marimo as like a modern replacement for Jupyter, but also for Streamlit. And what do you mean by reproducible? Yeah, so there's two types of reproducibility. So there's an interesting study by JetBrains where they took 10 million Jupyter notebooks from on GitHub, downloaded them all, and studied whether if you reran them from top to bottom, whether you would get the same results that were serialized in the notebook. And over a third of them weren't reproducible, meaning that you would get different outputs. There was another paper from 2019 that did a similar study that found similar results. And like in my own experience, like when I was at Stanford doing my PhD, my co-authors were amazing, great people, but not software engineers, would hand me a notebook. They're like, here's my code, and I would run it. And like, I can't reproduce your science. So I think there's like this reproducibility crisis in terms of the code on the page doesn't necessarily match the outputs you see in Jupyter notebooks. and that's specifically because a Jupyter notebook is really kind of a dressed up REPL, like a Python REPL. And so it's just code that you executed, the output at the time, but it doesn't capture the execution history in a meaningful way. So that's one kind of reproducibility that's the biggest kind. The other kind that we can talk about later is also package reproducibility. Like you may run an experiment but not have all the packages you use documented clearly so then others can't reproduce your environment. Now, I mean, the initial experience of Marimo feels a lot like a notebook. What is different about what's going on that adds in reproducibility? So when it comes to Marimo provides the user a guarantee, which is the code on the page matches the outputs you see. And the way that we do that is that Marimo looks like a notebook and it feels like a notebook, but it actually has like code intelligence built in. And like when you run one cell, all other cells that refer to the variables that the first cell defined are automatically run. So it's reactive, not unlike a spreadsheet. And so that means that like, if you edit one part of your notebook, you will see downstream cells updating to stay in sync with your code. But it's only the cells below, right? Does it update the cells above also? Above as well. So actually how it works is that we statically parse all your cells to find the dependencies across variable definitions and references. And we form a DAG basically on them. How do you keep it acyclic? I mean, couldn't there be circular dependencies here? So there's a contract basically between Maremo and the user. And one of the rules is you can't have cycles. and we can detect that. And so if you type in some code that introduces a cycle, we won't let you run it. And we'll say, hey, you have a cycle. It's across these cells. Here are some suggestions on how you can fix it. I see. And then what about this other stuff? So I guess like what makes notebooks not compatible with Git? Is it the sort of like giant files with the output included that make it annoying for Git? Is that what you're talking about there? Yeah, yeah. But so they've typically used JSON files, right, for storing the notebook code, but also the outputs. And you get this giant blob. So you make a small code change and you get a giant diff. So Marimo stores your notebooks as pure Python. And we guarantee that small code changes yield small diffs. That makes sense. And so what's your thoughts on Streamlit? I mean, I know that, you know, when Streamlit started, it was kind of an exciting new thing that I think at least added interactivity to Notebooks in a really cool way. You had the Streamlit founder on the podcast and you sort of talked about, you know, what he was thinking at the time. But what's your take on Streamlit? Like, why isn't that like a satisfying solution to some of these problems? Streamlit is a really cool project, and I think there's a reason it took off so quickly, because data scientists, machine learning folks, hate writing any front-end code. So I think that interactivity was really delightful. It's not, though, what's special about notebooks is, as you mentioned earlier, that they're great environments for just exploring your data and interactively prototyping algorithms, querying databases, et cetera. Streamlet is not designed for that. Streamlet's when you're finished with all those things and you just want to create an application. And what I noticed is that a lot of Streamlet applications, if you go to the GitHub repo, there's an IPython file, and then there's a Streamlet file, and it's like a direct port. So Marima bridges that gap that every notebook can be seamlessly run because we understand the relationships across cells, and Marimo ships with a bunch of interactive elements like sliders, data frame transformers, et cetera, from the command line, and I took this inspiration from Streamlit, you can say Marimo run mynotebook.py. It'll just hide the code and run it as a web app. So I think to your original question, though, I think the reason Streamlit doesn't solve this full problem is that you can't start your data work in a Streamlit file, whereas you can in Marimo. like you have one line and like it's going and you're seeing the results is that what you mean yeah more like so in streamlight you if you interact with a slider or something it runs your whole spirit from top to bottom in marimo we have a really granular dag so you interact with a slider it'll only run the code the cells that depend on it and so you still have that fast iterative interactive programming environment that i think many people have become familiar with after using Jupyter for so many years. Yeah that makes sense Anthony Goldblum was talking to me about his use of Marimo and I think he got the same workflow where he got notebooks and then he would put them into Streamlit so he was so excited to basically avoid that step in order to publish stuff. So he was telling me, he's like, I went from six hours to five minutes to deploy this stuff, and I was like, how is that possible, Anthony? And then I realized, oh, because you're actually translating your notebooks like into Streamlit and now like clearly you don't have to with remote to deploy them as web apps, I guess. Yeah, you just get it for free. Well, cool. I mean, tell me like, you know, is for you, I would think it seems like adoption has been has been pretty impressive. Do you are you intentional about growing adoption or do you feel like, hey, I make the best product and it's sort of, you know, naturally viral because people share it as a notebook and then they use it? Or do you have kind of like a, do you have sort of growth hacks that you're using to, to drive that adoption? We're trying to do both. So we're trying to lean in recently in the past couple of months of like getting the natural virality and people have started to share their notebooks more. We recently did an integration with hugging face. So it's really easy to deploy your Marimo notebooks as apps on spaces. and we have like a web assembly powered playground so you can create unlimited notebooks on our playground for free and share them with the link. I think early days, we really started to spread of a post on Hacker News. Some of those really took off. Our show HN is like the second top Python show HN. You drill down and you eventually get to, but so we got a lot of users initially through Hacker News. we haven't done too many growth hacky things. Maybe not, probably just because we're not expert growth hackers, but yeah. I feel like the show HN, when it works well, the comments can be mean. Was there kind of pushback from the Hacker News crowd? We were actually lucky. No, people were really quite nice. That's good to hear. Yeah, I think Simon Willison, creator of Django and now doing a bunch of cool projects, data set. He commented and it was a really positive and maybe that set the tone or something. I'm not sure. People were pretty nice. You know, one of the things I noticed about your background is that you spent a little bit of time at Netflix. Is that right? I did an internship. An internship, yeah. Yeah. And I was kind of thinking, you know, I feel like Netflix famously runs a lot of their production stuff on notebooks, I think, or at least that's what they were saying four or five years ago. Um, was that still the case when you were there and did that inform any of your thinking on, on Marimo? So I believe it was still the case when it was there, but I actually wasn't in that part of the org. I was more on, um, like algorithms engineering side. So actually I'd probably used zero notebooks, that internship, uh, it didn't really inform my thinking too much. I think they, They've managed to make it work, but I think the way that Jupiter is set up is that you have to jump through a bunch of hoops to make that work. And it is cool that they were able to, but most of my inspiration actually came from, I think, two projects. Mostly actually it came from a project from the Julia language called Pluto JL, which is an alternative to Jupiter that Marimos shares a lot of similarities with. So Pluto, observable by extension, Pluto is modeled after that, as well as Streamlit to an extent. Interesting. And what did you pull from these projects? Tell me about that. So Pluto I pulled a lot from. Pluto is a reactive notebook for Julia that has the same spreadsheet-style automatic execution. It lets you create sliders and other UI elements seamlessly. It has built-in package management like Marimo does too so that your notebook files are a self-contained reproducible unit. So I pulled all these ideas from Pluto. I saw in, I think, late 2021, I had just finished my PhD. I'm like, this is an amazing project. In the Julia language, people are switching from Jupiter to Pluto. So it's the highest, the most starred GitHub repo for the Julia programming language, aside for the Julia programming language itself. So to me, I saw this and I thought we immediately, I immediately thought we need something like this for Python. And then from Streamlit, what I pulled was, you know, I saw Pluto. I'm like, wow, these things look like web apps. And then I saw Streamlit, like these things are web apps. And so I realized that you can kind of merge the two together and have a single notebook-like thing that lets you do the notebooking and then also do the creating of the data apps. Are you a fan of the Julia language? How did you find yourself working with Julia notebooks? I admire what they're trying to do. I used Julia a bit, it's been a long time ago, during my undergrad. I did like math and machine learning at Stanford as well as computer systems. So the idea was really appealing to have a single language that bridged the gap in the two language problem. To usability and performance. I think it's still a relatively new language. So it has some sort of, it has some things that make it difficult to adopt. But I think I actually, I came, I wasn't using Pluto notebooks. I you know I at the end of my PhD just did like a broad survey of a bunch of different tools for working with data I just wanted to see what people were working on and I don't know I went down rabbit holes and somehow I found Pluto and I was yeah I immediately fell in love with the project Oh interesting so were you doing a survey because you're like hey I want to start a company doing tooling for machine learning Yeah that's right so So what happened was like I saw I used to work at Google Brain where I worked on TensorFlow. In my PhD, I did like machine learning and optimization. But a huge part of it was open source tooling for machine learning and optimization. And I realized the thing that I like the most is building dev tools that let other people solve problems instead of solving the problems myself. And so, yeah, that's why I did this survey. So I wanted to see how can I use my systems background and my ML background to make the biggest impact. Cool. What else did you like? Were there any other, what other rabbit holes did you go down in that survey? Oh, yeah, man. I should pull up my Obsidian Vault. It's pretty gnarly. One of them actually was making it a lot easier to access cloud computing. And so in a few different directions. So one was like provisioning. And I think like actually modal and like the folks from the sky computing lab at Berkeley are doing this really well now. It's just, you know, every PhD student feels like, wow, it's absurdly difficult to run something in the cloud. I just want to ship this one function there and just run it on a GPU. And it's cool to see that happening. So that was one. The other one was kind of related to what I did at Netflix, actually, which was like studying like cluster management and like optimizing cluster usage to minimize costs and maximize utilization. I think a company called Anti-Metal and probably a few others are doing this now. Interesting. And how did you choose Notebooks? Choose notebooks. Choose notebooks. Yeah. You know, there was a systems part of me who was really attracted to these cloud computing problems. But then there was the part of me that liked designing for humans and individuals that was really attracted to the notebook project. And I think ultimately, after watching a bunch of people and myself, you know, doing research and like working with data, I saw that a lot of it starts in notebooks. And people love notebooks, but they also kind of hate them. If you just do any Jupyter Notebook site, reddit.com, you will find people just complaining about hidden state, like, oh, God, the JSON file format. And people feel so strongly that something better should exist and nothing existed. And once I saw Pluto, I was like, okay, there is a way to make something better. and this will have a huge impact on so many people. To me, it was clear that this is what I should do next. You know, it's funny. I remember, I think I saw an interview with the founder of the Jupyter Notebooks and he was talking about how he was inspired by Mathematica, which is something I remember from, you know, my math undergrad at Stanford. I don't know if people still even use it, but that actually had like some more kind of delightful features where you could, you know, kind of do more complicated inputs than just like a linear, you know, block of code. I'm curious if you've any experience with Mathematica or view that as an interesting kind of notebook. Actually, I embarrassingly don't, probably because I was a CS major first and math minor second. Ah, I see. I do know that some of the folks in the math department still, you know, or like who come from like more of a double E background use Mathematica, but that I'm sure Marima has somehow has the imprint of Mathematica on it, but probably through indirection, through its impact on Jupiter and Pluto. It's kind of like a grandchild of Mathematica, I guess in that case Yeah Cool Well look I want to ask you I mean are you aware of some of the real world use cases I mean it seems like one of the really cool things what you doing is it so broad Like, what are some of the interesting things that people are doing today with Marumo? Yeah, so there's a lot. That is one of the fun things because it is so broad. So you talked about Anthony Goldblum. So he identifies, I think, one persona. and there's many people like him who do this, who first use a Marima notebook to explore some data, but ultimately what they end up with is a little mini reusable app, like a tool that they can use on a daily basis to do some analytics work or some kind of internal task. So one of them is, and maybe that's sort of the Streamlit-style application of I'm creating tools for myself and for my company to do analytics and other things. And what's interesting is that they're not just like Anthony's case, they typically interact with data. But I've also had people reach out to me saying like, I'm a software engineer. I've never used notebooks before. I never wanted to. But Marimo appeals to me because of the developer experience. And I'm using it to manage and build a dashboard for my EKS cluster or something like that. I'm so surprised to hear that because I feel like the first interaction with Marimo is just like a notebook. Yeah. At least for me, it took me a little while to even notice what the difference was messing around. Like I just, our notebooks like so offensive to like a software developer. Like I just, it's hard for you to imagine that. Yeah. It's surprising. Like a number of people come into our discord and they're like, Oh yeah. but I haven't ever used a notebook. Wow. When we try to get that, I'm setting context and I ask them about like, yeah, they're like, I've never used notebooks. I never wanted to. I think maybe it's because we mentioned, I think it's the Git friendly and the stored as pure Python part. Right. Software developers like, oh, okay, cool. That makes sense. Like, I can work with that. Totally. You know, actually, a bit of a tangent, I can come back to your use cases that you were talking about. But you mentioned like this, I mentioned Git friendly. I am pleasantly surprised, but really quite surprised by like the extent to which like that is the hook that has brought in a ton of our users. You know, I thought it would be like the reproducibility or like the super duper interacted elements. Like, so for example, if you output a data frame in Maremo, you automatically get like a filterable, searchable table and page you the whole data, not just like a tiny static HTML preview like you get in Jupyter. I thought that was the thing that would bring people. But I keep on hearing over and over again, it's the Git friendliness. I gave a talk at UIUC at a business analytics class last week. And so I talked about all the data-related features. I talked about how we have built-in support for SQL. And at the end, the professor was like, I'm going to use this because it's Git friendly. I'm so surprised. I've used notebooks for years. I've used Git for years. I feel like the execution model of Remo is like really cool. Like deploying to the web is like super cool. The Git friendliness, I didn't even notice it. I don't know. I guess people are like freaking out about big diffs in their diff words. I don't know. Maybe I don't even like look that up. So I think they, yeah, I think they get scared by the diff. Yeah. How do you actually handle it? So like, I mean, do you not like store the output? Because what if the output is a giant table? Like, how do you... Yeah, so we don't store the output, but we have an option to store the outputs like alongside the notebook. I see. Yeah, so they're just in separate files and you may not track your HTML with Git or if you did, then you would just be like, you would know to just like ignore the diff and you could at least see the code diff. Because there is sometimes a nice experience with a notebook of like looking through it without having to run it yourself. A hundred percent. Yeah, so that's why we have sort of that snapshotting feature. But yeah, so I think a lot of people come for the Git-friendly, but then they stay for all the other goodies that they discover along the way. Like they discover the execution model, the interactive elements, and all those things. Do you even mention the Git-friendly on your website? Is that just like a word of mouth? We do mention it. Somehow I didn't even register. I got to look at your website now. Oh yeah, Git-friendly. You're right. Yeah. Wow, who do realize people are getting mad about the... Yeah, so I don't come from a business analytics background. I come from more of the machine learning background. But yeah, so I was surprised that the BI people also like it. Talked to someone else who does business analytics, I think at Capital One. And she was also like, oh yeah, they get friendly. And I think she was like traumatized by having to deal with like Tableau dashboards where like you have no idea like the provenance of what you're looking at. And if an audit comes, then, like, you get really scared. I'm going to stop talking because I don't know anything about that field. Totally. Yeah, so that was a tangent. I'm sorry, we're talking about use cases. Yeah, yeah, yeah. It sounds like BI is an important use case for you. But I mean, more like practically, like, do you get stories of people using it in interesting ways? Yeah, so Anthony is building his little mini apps. He calls them mini apps. A lot of people use it as just a better Jupyter notebook. So everything you would do in a Jupyter notebook, whether it's like munging data frames or training a machine learning model, people will just use it for that workflow and they'll find that the reactive execution and the UI elements like sliders, drop downs, et cetera, will just make them way more productive. I mean, the simple experience of having a data frame where you can actually type into a search box and get your filtered in real time, I think is just a big step up from the experience you have in a Jupyter notebook. So that's one. And I think a lot of our users, when they first hear about Marimo, they get a little scared, especially the machine learning folks, by the automatic execution element. Because they're like, oh, I don't want to accidentally kick off a training job on GPUs or I don't want to hit my open AI, you know, whatever endpoint. So we do have, you can configure the runtime to be lazy so that you can't, like if you execute a cell, the downstream cells will be marked as stale and like there's a visual cue and everything. but they won't automatically run. And then there's like a button to bring it back up to date. So you can and people do use it for the more expensive sort of computation as well. Yeah. Well, okay, so tell me about how you think about AI coding inside of Marimo. Another kind of notable difference that maybe you don't highlight as much is that you have code generation, like built right into the application. It's kind of cool. It seems like the code generation actually looks at your data as well as the code, which is cool for a data-centric application. Yeah, so we do have built-in like AI integrations. We really try to be batteries included across a lot of dimensions. So here there's like built-in support for GitHub Copilot and Codium. And then you can also, if you bring your own keys, You can generate code using the context-tabular notebook, as well as the schemas of your data frames or your attached DuckDB tables. We should probably highlight it more. Yeah, we should probably. You know, when I was talking to Anthony at a table and he was talking about some other product, and he's like, oh, yeah, what's really cool is that, like, that product, you know, I told it to generate some matplotlib code and then change it to plotly, and it did. and I was like, oh, you can do that in Marimo. He's like, what? Oh. So yeah, we should highlight it more. Do you find it's funny, I actually started doing like notebooks and streamlit inside of Cursor just because it was like, you know, kind of so fast to develop it and I was kind of used to the code tools. Could you also develop like Marimo inside VS Code or something like that? Is that a common pattern? Sort of. We do have a VS Code extension that does like a split pane kind of thing where like you can type in the code cells on the left and it like auto runs the notebook UI on the right. I think we have some more work to do to make that feel more seamless. It's a little tricky because I guess our, it's not quite an IDE, but it almost is. Like our editor has like so many features, like from like the data explorer panel to like live documentation, as well as the built-in AI stuff that I feel it's best experienced. I feel like a lot of folks went from, yeah, I feel it's best experienced in the editor, but people do ask, like, oh, can I use it in Cursor? Can I use it in VS Code? Well, I'm feeling like the growth team, you know, telling me that this podcast is really kind of going off the rails, but I'll just take it further off the rails and just ask my own tech support questions here. Do you, like, I think one of the great things about Cursor you're kind of chatting, which doesn't feel like Marimo is like oriented towards like a chat code generation. Is there a way to sort of like ask it more broadly? Like, hey, you know, what's going on? And can you fix this up? Like maybe like Anthony was saying, like, you know, convert Matplot to Plotly or whatever. You can do that like on a per cell basis. Like you can like refactor the cell. I think as we build out our roadmap for the next year, like part of it is IDE features and part of that will be more modern AI features such as maybe like a cursor experience That makes sense What else is on your roadmap I mean how do you even think Are you a two organization now Yeah, we're two people. We did raise a seed round, which we just announced this week. Congratulations. Thank you, yeah. So we are looking to grow the team, but we've been just two people now, me and Miles. For things that, there's a lot of stuff we want to build. some of them are better more thorough integrations with SQL so I mentioned I alluded to that Marima supports SQL and so Marima is pure Python it's stored as pure Python but you can embed other languages in it one of those is SQL and we actually build a data flow graph across both Python and SQL which could be a conversation of a whole other PyCast but the point is in your SQL you can query data frames and you get a data frame out, and this is powered by DuckDB. One thing we want to do is, I think, build out that experience more. So make it easier to connect to other types of databases as well, whether it's Postgres or whatever you're querying. Have richer column previews so that you can explore all your data at a glance without having to actually execute queries or very many. Other things on our roadmap are exploring some of the infrastructure side of things that I was mentioning that I was interested in early on. I think there's some interesting things related to hybrid execution. I think a common model is a lot of your notebooking code can be done locally, except for some really expensive cells, which you may want to ship off to remote servers. so that's one thing we're looking at just making it easier to run expensive workloads and then also better support making it really, really easy to share Marima notebooks in two ways like we're building out a community cloud which is free because it's powered by Pyodide and WebAssembly meaning it runs entirely in the browser and we may also look into like static site generation so like in your make docs or like Sphinx documentation you'll have Marima notebooks running with live code, interactive widgets, all those things. And I guess that tees up an obvious kind of investor question, I guess, which is if you look at, I mean, Streamlit obviously was sold for a lot of money, but it was sold kind of pre-monetization or just as they were starting to monetize. And I don't know if they ever really got it meaningfully monetizing. And, you know, Jupiter Notebooks, obviously, like an incredibly successful, you know, project that was never intended to be monetized. So it does sort of seem like maybe this, we don't have quite figured out the right, you know, monetization model for this type of thing. Like, how do you think about, how do you think about that? Yeah, so I think it's the way that Marimo is different from Streamlit is that Streamlit usually really only came in at the very end of like a data project, like for some small fraction of which you wanted to deploy an app. And so maybe this is the way we're focused on just the app deployment. What I think compelling about our product is that Marimo is where you start. it's where you first start in querying data, where you first start training a model, et cetera. And you can also carry it all the way towards the data app. So I think I also mentioned earlier that like notebooks are the central interface to compute in a lot of commercial products like Databricks, SageMaker, et cetera. Totally. So I think there may be an angle for us to lean into there as well. Like the surface area of the user workflow that we can monetize is, is much larger than streamlight and jupiter i think intentionally started as a non-profit and really had had no intention to to go out into any of these infrastructures type type problems did you ever consider a structure like that or were you always hey i want to be a venture-backed startup i didn't really you can just build a lot faster by just working on it full-time with a bigger team. So, yeah, I was sort of thinking about startup from the beginning. Although I did start with, this let me and Miles really focus on crafting, I think, a delightful product. We started with funding from Stanford's Slack National Laboratory, which was enough for just the two of us for two years. This is a linear accelerator? Yeah. Cool. Yeah. And so there were some scientists there. I was having coffee with one of them one day. She knew him back from my PhD. And he was like, what are you working on? And I was like, oh, I want to build this notebook thing that also lets you make apps and run a script. And he's like, that's really cool. We really need that. We use Jupyter and it's so many problems. We'll pay you to do it. And I was like, wow, that sounds good. And so we worked with them for two years and they gave us a lot of early feedback before we open sourced it. And it was, I think, really essential for us to make a product like this. You know, Marima feels like such this like kind of delightfully simple thing, but I'm sure there's so much like thought and work that goes into it. Like were there, have there been like kind of tough product choices along the way or pads you went down that turned out to be dead ends? Yeah, I think there have been a few. so there's definitely tough technical challenges in terms of product challenges I think the biggest one was like so we've talked about how like Marima notebooks are reactive you run a cell dependent cells automatically run or a mark test still and then you brought up the objection that like hey that means I can't have cycles it also means you can't define the same variable twice across two different cells. Because if you do that, then Marima doesn't know which definition to run first and your outputs won't be reproducible. So to use Marima... I found it very confusing my first time user experience. For sure, that was... Yeah, it's a big break from traditional notebooks. It's a big break from traditional notebooks. But we felt that was totally... We felt that was necessary for us to have in order to have the amount of reproducibility that we wanted. And there was precedent. Like I saw Pleto did it and Polito's doing very well in terms of open source adoption now. So I'm like, okay, people can get used to this. So we stuck with that. We chose, okay, Marima's a DAG. Essentially, Marima's a DAG on Notebook Cells and it's going to have these constraints. We'll try to explain them to our users. But it's going to have these constraints. But we got a lot of people, not a lot, but we got a number of people coming into our Discord, basically asking us, can you not or like can you have an option to like can you can there be a toggle for me to turn off the dag so that it's like jupiter again and i can just you know type random code whatever i want and then toggle the switch back on later yeah we we considered that but we and you know they're like they you know their reasoning was like that will only increase the amount of stuff you can do in remote but i think it would also like lead to like a if you look think of like the larger ecosystem then like people would share notebooks with you that were developed in i don't know dangerous mode or something and then those wouldn't work like the dag is necessary for the notebook to run as an app and be run as a script so then all those downstream things that we enabled would have broken so in the end we decided to politely say no we can't we can't do this sorry like we cannot let you turn off the dag um okay but tell me okay now we're going into like personal lucas tech support but what do you do like what's your workflow like when you want to like quickly like tweak one thing but not necessarily like save it so you want to copy into another cell like some block of code and just like see what happens yeah and all the variable names get copied yeah how do you yeah you're supposed to do that two two two recommendations um so one would be i tell folks to try and use functions as much as possible functions introduce a local namespace right um so variables define a function don't get added to global so you can have as many of those redefined as you like yeah at first say lucas wrap that code into a function and then you know call it multiple thing which requires some more you know a little bit more engineering wrangling but it can be done right and then but if you don't want to do that what you can do is you can prefix a variable with an underscore and in marima what that does is it makes it local to the cell oh that's cool so then you can reuse that name across multiple cells but it but yeah because local to a cell then you can't access it in another cell and so you don't have any reproducibility issues. So that is the escape hatch. Well, look, I, I, um, I feel really optimistic for you. I feel like you made a really beautiful, um, product that's getting incredible adoption. So it's exciting to catch you like kind of early in the journey. It'd be fun to, to check back in in a year or two and, uh, and see right up, but I really appreciate you taking the time to, to talk to me. Of course. Yeah. It was a pleasure. Um, yeah, working at Marimo is definitely a dream job. So yeah awesome well good luck um rooting for you thanks lucas take care you too thanks so much for listening to this episode of gradient descent please stay tuned for future episodes

Share on X Share on LinkedIn