How to transform large groups of similar crappy html pages into quality css-based pages?

How to transform large groups of similar crappy html pages into quality css-based pages?

哑剧 发布于 2021-11-25 字数 1886 浏览 843 回复 5 原文

What is the best way to transform large bunches of very similar web pages into a newer css-based layout programatically?

I am changing all the contents of an old website into a new css-based layout. Many of the pages are very similar, and I want to be able to automate the process.

What I am currently thinking of doing is to read the pages in using HtmlAgilityPack, and make a method for each group of similar pages that will create the output text.

What do you think is the best way to do this? The pages mostly differ by things like which .jpg file is used for the image, or how many groups of heading-image-text there are on that particular page

EDIT: I cannot use any other file type than .html, as that is all I am authorized to do. Any suggestions?

EDIT2: Ideally, I would also be able to make this be generic enough that I could use it for many different groups of html files by just switching around a few moving parts.


The above link is a sample of what I am dealing with. The parts that would differ between pages would be:

  • the meta description tag
  • various headers, especailly the main header
  • almost every image on the page will be new
  • the text for each video will be unique, but they will be grouped together in similar chunks
  • the video files, and video sizes will be unique

Everything else is the same, and the format of the pages is also the same.

EDIT3: When in doubt another thing that might be helpful is to write some code that will write the pages for me. I just need to cut out the parts of the originals that are variable, and put them into a data file that gets read and used to write the new versions.

如果你对这篇文章有疑问,欢迎到本站 社区 发帖提问或使用手Q扫描下方二维码加群参与讨论,获取更多帮助。



需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。


荭秂 2022-06-07 5 楼

When faced with old, often generated code like this, I tend to lean towards the search and replace in my text editor.

Sounds awful, doesn't it?

Seriously though, if you get a powerful editor that supports searching multiple files and/or regular expressions, that can remove the bulk of the nasty code. It's not a perfect science to say the least, and some manual manipulation may be necessary to get it into a "useful" form, but it takes away the bulk of the cleanup work.

偷得浮生 2022-06-07 4 楼

Depends on the page, you could write scripts in Perl or any other scripting language your comfortable with to do as much as possible and have them note anything they couldn't fix or didn't understand.

一杯敬自由 2022-06-07 3 楼

While this might sound a bit glib, the best real option I could offer would be Rent-A-Coder

狼性发作 2022-06-07 2 楼

I think it depends on how many pages there are, if there are not too many, you could create a template and use a wysiwyg editor to copy and paste the content.

However if you need to do it programaticaly I would suggest parsing the html to extract the content.
Or cleaning it up, If you have access to it you can use Expression Web, which I used for a similar task, you can clean the html and only leave the header tags, paragraph etc, then you can apply css to it to format it in the design you wish.

However it might take longer to write code to do it than do it manualy.
Sometimes nothing is faster than by hand.

Good luck

浊酒尽余欢 2022-06-07 1 楼

It depends on how similar "very similar" actually is. If you mean that they effectively use a number of templates, then I would probably build new templates for the new design using Template-Toolkit and suck out the data using Template::Extract. Possibly storing the data in a local database to make it easier to rebuild the pages in future.