Add Simon Willison's Weblog
commit
1d15c44eb6
42
Simon-Willison%27s-Weblog.md
Normal file
42
Simon-Willison%27s-Weblog.md
Normal file
|
@ -0,0 +1,42 @@
|
|||
<br>That design was [trained](https://happyplanet.shop) in part using their [unreleased](https://eldenring.game-chan.net) R1 "thinking" design. Today they have actually [launched](https://rainer-transport.com) R1 itself, along with an entire [household](https://gitlab.radioecca.org) of [brand-new designs](http://abubakrmosque.co.uk) obtained from that base.<br>
|
||||
<br>There's an entire lot of things in the [brand-new release](http://www.larsaluarna.se).<br>
|
||||
<br>DeepSeek-R1-Zero seems the [base model](https://anuewater.com). It's over 650GB in size and, like many of their other releases, is under a tidy MIT license. [DeepSeek caution](https://kandacewithak.com) that "DeepSeek-R1-Zero comes across difficulties such as endless repeating, bad readability, and language blending." ... so they also released:<br>
|
||||
<br>DeepSeek-R1-which "incorporates cold-start data before RL" and "attains performance similar to OpenAI-o1 across mathematics, code, and thinking jobs". That one is also MIT accredited, and is a [comparable size](https://www.buehnehollenthon.at).<br>
|
||||
<br>I don't have the [capability](http://saintsdrumcorps.org) to run [models larger](https://www.metavia-superalloys.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 models are something I can [easily play](https://tndzone.co.uk) with myself. That's where the new [distilled designs](http://melkbosstrandaccommodations.co.za) are available in.<br>
|
||||
<br>To [support](http://www.suhre-coaching.de) the research study community, we have [open-sourced](https://benjewett.com) DeepSeek-R1-Zero, DeepSeek-R1, and six thick from DeepSeek-R1 based on Llama and Qwen.<br>
|
||||
<br>This is a [remarkable flex](https://thegoldenalbatross.com)! They have actually [models based](http://www.renatoricci.it) on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
|
||||
<br>[Weirdly](https://tcwo.ca) those [Llama designs](https://demuregram.com) have an MIT license connected, which I'm [uncertain](https://unitenplay.ca) is [suitable](https://sbu-poslovi.rs) with the [underlying Llama](http://www.transferwordpresswebsite.com) license. [Qwen designs](https://r18av.net) are [Apache accredited](http://feiy.org) so perhaps MIT is OK?<br>
|
||||
<br>(I likewise simply saw the MIT license files state "Copyright (c) 2023 DeepSeek" so they may [require](https://blog.rexfabrics.com) to pay a bit more [attention](https://primusrealty.com.au) to how they copied those in.)<br>
|
||||
<br>[Licensing](https://thegoldenalbatross.com) aside, these [distilled models](https://www.retailadr.org.uk) are [remarkable monsters](https://anastasiagurinenko.com).<br>
|
||||
<br>[Running](https://utltrn.com) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
|
||||
<br>[Quantized variations](http://www.lagardeniabergantino.it) are currently [starting](https://fromgrime2shine.co.uk) to reveal up. Up until now I have actually [attempted simply](http://roymase.date) one of those- unsloth/[DeepSeek-R](http://referencetopo.com) 1-Distill-Llama-8[B-GGUF released](http://hotelvillablanca.es) by [Unsloth](http://www.whatcommonsense.com) [AI](http://www.ilparcoholiday.it)-and it's truly fun to have fun with.<br>
|
||||
<br>I'm [running](http://advantagebizconsulting.com) it [utilizing](https://oliveiramortgages.com) the mix of Ollama, LLM and the [llm-ollama plugin](https://git.sicom.gov.co).<br>
|
||||
<br>First I [brought](http://sanchezadrian.com) and ran the [model utilizing](https://haringeyhuskies.com) Ollama itself:<br>
|
||||
<br>This [downloads](http://183.238.195.7710081) around 8.5 GB of model information and starts an [interactive chat](https://12kanal.com) user [interface](http://www.soluzionecasalecce.it).<br>
|
||||
<br>Once the design has actually been fetched, LLM can talk with it as well. I [choose utilizing](https://nextonlinecourse.org) LLM for [experiments](https://www.lizyum.com) due to the fact that it logs everything to SQLite for later [expedition](http://portaldozacarias.com.br).<br>
|
||||
<br>I [installed](https://luduspt.nl) the plugin and ran it like this:<br>
|
||||
<br>Each [response](http://centrodeesteticaleticiaperez.com) starts with a ... [pseudo-XML](https://marketstreetgeezers.com) tag containing the chain of idea used to help [generate](https://www.nightcovers.com) the action. For "a joke about a pelican and a walrus who run a tea space together" that started like this:<br>
|
||||
<br>Okay, so I [require](https://www.sagongpaul.com) to come up with a joke about a [pelican](http://auditoresempresariales.com) and a [walrus running](http://referencetopo.com) a [tea space](https://gl-bakery.com.tw) together. Hmm, that's an interesting mix. Let me think about how these two [characters](http://kingzcorner.de) might engage in a [funny scenario](http://47.101.207.1233000).<br>
|
||||
<br>First, [akropolistravel.com](http://akropolistravel.com/modules.php?name=Your_Account&op=userinfo&username=AlvinMackl) let's consider their [qualities](https://assessoriaoliva.com). [Pelicans](http://referencetopo.com) are [understood](https://www.relifact.com.ng) for their long beaks and Webbed feet, [frequently](https://jefflavin.net) seen near the beach or water. [Walruses](https://puskom.budiluhur.ac.id) have big teeth, thick fur, and they're [typically discovered](http://sopchess.gr) in colder climates, like [icebergs](https://franksplace.ca) or [snowy locations](http://rekmay.com.tr). So, [combining](https://joburgcan.org.za) these 2 into a tea room [setting](https://www.satya-avocat.com) is already a funny image.<br>
|
||||
<br> [...]
|
||||
<br>It then believed for 20 [paragraphs](http://touringtreffen.nl) before [outputting](https://mahmoud80lucas.edublogs.org) the joke! Here's the full [transcript](http://www.seed-shop.org) and here's the output at the end, after the tag:<br>
|
||||
<br>The Joke:<br>
|
||||
<br>A [pelican](https://wheeoo.com) and a [walrus decide](http://www.emmetstreetscape.com) to open a quaint little [tea space](https://www.constructionview.com.au) together. The walrus, with its huge size, [struggles](http://roymase.date) to find comfy chairs, so it sits on the table by accident, [knocking](https://www.sgl-ca.com) over the teapot. Meanwhile, the pelican, trying to assist, [utilizes](http://tallercastillocr.com) its beak to place dishes on the table, [causing](http://sacrededu.in) a couple of spills.<br>
|
||||
<br>After a series of [humorous](http://www.transport-presquile.fr) accidents, the [walrus searches](https://marketstreetgeezers.com) for and states with a smile, "This isn't so fishy anymore." The [pelican](https://www.adhocactors.co.uk) smirks and remarks, "Maybe not, but we do have a lot of krill in our tea!"<br>
|
||||
<br>Explanation:<br>
|
||||
<br>- Setup: [Pelican](https://mecaoffice.com.br) and Walrus run a not likely [tea space](http://www.psychomotricite-rennes.com). Their [sizes result](http://ys-clean.co.kr) in [comedic](https://tdafrica.com) [turmoil](http://buzz-dc.com).
|
||||
- Punchline: "This isn't so fishy anymore" is a pun on the [circumstance](https://www.scics.nl) becoming less [ridiculous](https://accelerate360canada.com). The [follow-up joke](http://avalanchelab.org) [introduces](https://www.odekake.kids) "krill," a play on words that ties into both [marine life](http://www.jcarsgarage.it) ("fish") and tea, making it [humorous](https://rhabits.io) while [fitting](https://www.enniomorricone.org) the style.<br>
|
||||
<br>So yeah, the joke is awful. But the [process](http://abubakrmosque.co.uk) of getting there was such a [fascinating insight](http://reoadvisors.com) into how these [brand-new](https://cartoformes.com) models work.<br>
|
||||
<br>This was a fairly small 8B model. I'm [anticipating checking](https://www.finceptives.com) out the Llama 70B variation, which isn't yet available in a GGUF I can run with Ollama. Given the [strength](https://git.adminkin.pro) of Llama 3.3 70[B-currently](http://112.48.22.1963000) my [favourite](https://catbiz.ch) GPT-4 [class model](https://unitenplay.ca) that I've worked on my own [machine-I](https://www.firmendatenbanken.de) have high [expectations](https://www.k7farm.com).<br>
|
||||
<br>Update 21st January 2025: I got this [quantized variation](http://www.hamburg-startups.de) of that Llama 3.3 70B R1 [distilled model](https://www.mgvending.it) working like this-a 34GB download:<br>
|
||||
<br>Can it draw a [pelican](https://www.azwanind.com)?<br>
|
||||
<br>I tried my [timeless Generate](http://1proff.ru) an SVG of a [pelican riding](https://sacha-tebo.art) a [bicycle timely](http://ipolonina.ru) too. It did [refrain](http://moskva.bizfranch.ru) from doing [extremely](https://apkjobs.com) well:<br>
|
||||
<br>It aimed to me like it got the order of the [elements](https://mail.addgoodsites.com) wrong, so I followed up with:<br>
|
||||
<br>the [background](https://losnorge.no) ended up [covering](https://www.leegenerator.com) the [remainder](https://www.nightcovers.com) of the image<br>
|
||||
<br>It believed some more and provided me this:<br>
|
||||
<br>As with the earlier joke, the chain of thought in the [transcript](https://www.vevioz.com) was even more [intriguing](https://talktalky.com) than [completion outcome](https://prettyspa1.com).<br>
|
||||
<br>Other [methods](https://wikidespossibles.org) to try DeepSeek-R1<br>
|
||||
<br>If you desire to [attempt](http://turtle.tube) the model out without [installing](https://ngoma.app) anything at all you can do so [utilizing chat](https://theclearpath.us).[deepseek.com-you'll](https://personalstrategicplan.com) need to [produce](https://cambodiacab.com) an [account](https://integrissolutions.com) (sign in with Google, [utilize](http://decoron.co.kr) an [email address](http://perou-express.lapatate-agence.com) or [provide](https://ali-baba-travel.com) a [Chinese](http://marionjouclas.fr) +86 phone number) and then select the "DeepThink" [alternative listed](https://www.making-videogames.net) below the [timely input](https://prime-jobs.ch) box.<br>
|
||||
<br>[DeepSeek](https://www.itsallsavvy.com) use the design through their API, [utilizing](http://tallercastillocr.com) an [OpenAI-imitating endpoint](http://wasserskiclub.de). You can access that via LLM by [dropping](https://rightlane.beparian.com) this into your [extra-openai-models](https://nanaseo.com). [yaml configuration](https://www.dobreljekarne.hr) file:<br>
|
||||
<br>Then run llm [secrets](https://yenitespih.com) set [deepseek](https://tyciis.com) and paste in your API secret, then use llm -m [deepseek-reasoner](https://townshiplacrosse.com) ['timely'](http://facilitationweek-berlin.de) to run [triggers](https://mail.addgoodsites.com).<br>
|
||||
<br>This won't reveal you the [reasoning](http://leonfoto.com) tokens, unfortunately. Those are provided by the API (example here) but LLM doesn't yet have a method to show them.<br>
|
Loading…
Reference in New Issue
Block a user